ebadawy / voice_conversion

MIT License
129 stars 36 forks source link

Exploding losses during voice-conversion training #17

Open neuronx1 opened 2 years ago

neuronx1 commented 2 years ago

Thanks for the great repository. Unfortunatly I have a problem during the voice-conversion training.

After the first 2 Epochs I get exploding losses. grafik

What's the reason for that and how can I solve this?

I am happy about every tip

Thanks in advance!

ebadawy commented 2 years ago

@neuronx1 do you still have the problem? If possible can you mention what datasets did you use and the steps for setup/training process?

neuronx1 commented 2 years ago

@ebadawy thanks for your reply! Unfortunatley I still have the problem. I used for training your pretrained model and finetuned this on my own dataset. The dataset contains 2 speaker. I have 900 audio files of each speaker, each between 5 and 30 seconds long. In preprocessing, I have removed the silence from each of these files. The sampling rate is 22,050 Hz and the audio channel is mono. Also, the file is 16 bit. I would be very happy to receive any tips!

ebadawy commented 2 years ago

did you use the same preprocessing that we do? it should be done through src/preprocess.py. I believe the current sample rate we are using is 16kHz. Probably you will need to change the hyperparamters if you want to use a different sample rate. Once thing i would recommend to try is to train the model from scratch without pretrained models. the final quality might not be as good but you should not get exploding loss.

ColorBuffer commented 2 years ago

I have same problem dataset: Flickr speakers: number 4, number 7 default options of repository.

03vmate commented 2 years ago

652fa0bdc0f774a4bedc81b5a549bad1c2158f2a completely broke train.py, revert it, and fix indentations yourself, that will fix this issue

ebadawy commented 2 years ago

So the current state of the code seems to be not working properly. Unfortunately, i won't have the time to see where the bug is. However, to get a working version you can revert back to be362f9102e01e9e173657c5b31650c7dcc3699d. This is the original training pipeline that I know for sure to work.

Same solution should work for issues #20 #19 #15