Rudrabha / Lip2Wav

This is the repository containing codes for our CVPR, 2020 paper titled "Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis"
MIT License
699 stars 153 forks source link

Results of training on my own dataset #39

Open dongdongdashen opened 2 years ago

dongdongdashen commented 2 years ago

Hi,When I train with my own dataset, the result is as shown in the following picture.Can you tell me where is the problem? May be the length of data is too short? (my dataset is about 20 minutes ) Looking forward to your reply : step-5000-align step-5000-mel-spectrogram

Domhnall-Liopa commented 2 years ago

Hi, is it possible you've got teacher forcing enabled? Going by the attention matrix, the encoder doesn't seem to have an impact on the decoder

dongdongdashen commented 2 years ago

Hi, is it possible you've got teacher forcing enabled? Going by the attention matrix, the encoder doesn't seem to have an impact on the decoder

Hi,I check the hparams.py and train.py in synthesizer folder,how can I stop teacher forcing when training? is it ok? 0N%31G23$ZPP)( ROOA4E0H

Domhnall-Liopa commented 2 years ago

I think if you use the following in hparams.py:

tacotron_teacher_forcing_mode="constant",
tacotron_teacher_forcing_ratio=0.,

It will disable teacher forcing in training

dongdongdashen commented 2 years ago

I think if you use the following in hparams.py:

tacotron_teacher_forcing_mode="constant",
tacotron_teacher_forcing_ratio=0.,

It will disable teacher forcing in training

thanks!

comvee commented 2 years ago

Could you share the attention plot after disabling the teacher-forcing? Did you solve the problem?