Closed MaN0bhiR closed 1 year ago
seems that the wavlm feature length is not consistent with spectrogram length. delete all '.spec.pt' files, make sure the wavs in the wav dir (named as 'DUMMY') are all 16kHz, and run again?
well ,I have .pt files in dataset/wavlm folders (not .spec.pt) which are generated using preprocess_ssl.py (I also confirmed if I am giving downsampled 16K wav folder in --in_dir argument). I'll try excluding folders with different sampling rate to see if that solves the issue
I also had same problem too. Use former version of data_util.py, train.py didn't contain TextAudioSpeakerCollate() will temporarily solve those problem
I also had same problem too. Use former version of data_util.py, train.py didn't contain TextAudioSpeakerCollate() will temporarily solve those problem
Thank you , it worked. @OlaWod does using older version of these files effect performance?
I also had same problem too. Use former version of data_util.py, train.py didn't contain TextAudioSpeakerCollate() will temporarily solve those problem
Thank you , it worked. @OlaWod does using older version of these files effect performance?
yes, it has more distortions according to my small scale test.
yes, it has more distortions according to my small scale test.
The model I fine-tuned is messing up the content and pronunciation but voices are really accurate. Do you think using newer version of data_utils would help? or is it simply the case of overfitting?
yes, it has more distortions according to my small scale test.
The model I fine-tuned is messing up the content and pronunciation but voices are really accurate. Do you think using newer version of data_utils would help? or is it simply the case of overfitting?
i think it more likely because of overfitting
yes, it has more distortions according to my small scale test.
The model I fine-tuned is messing up the content and pronunciation but voices are really accurate. Do you think using newer version of data_utils would help? or is it simply the case of overfitting?
https://github.com/OlaWod/FreeVC/issues/57#issuecomment-1426857492
Try this in release version.
Hey , after finetuning , though the voices are more accurate , the content is kind of messed up. But when I run the same conversion without fine tuning (on provided pre-trained models) the content is preserved while the voices are inaccurate. Do you think more training (finetuning for little longer) would help? My dataset contains sound files from 10 different speakers with 2hr duration , is it too less? If not , how many epochs of training is ideal for that?
Hello @MaN0bhiR , how did you end up improving the fine-tuned quality of converted voices? Are 2hrs duration of audios enough for fine-tuning?
Hi, I am trying to finetune FreeVC-s model with a small dataset formatted in vctk format. I have done necessary changes in config file. I am also not using SR-based augmentation. But when I run train.py it throws me this error.
PS: In my dataset few speakers have wav files at 22050 sampling rate , while other speakers wav files are at 32000 sampling rate. But I have used downsample.py to ensure that all of them are sampled down to 16000.