andabi / deep-voice-conversion

Deep neural networks for voice conversion (voice style transfer) in Tensorflow
MIT License
3.92k stars 843 forks source link

did anyone finish sequence to sequence attention training? #82

Open benlaitang opened 5 years ago

benlaitang commented 5 years ago

I write this referencing by https://github.com/keithito/tacotron, but it does not work. the ground truth mel-spectrogram as input can work, but predicted mel failed. Can anyone give me advises?

MorganCZY commented 5 years ago

I also have this issue. The audios from validation process sound great, while in testing process, the predicted mel spec rather than the ground truth will be input into the next time-step's pre-net, which leads to quite abnormal generated audios. I found the alignment images were not in diagonal shape. It proves the attention mechanism hasn't been learned well. However i don't know how to adjust the model or the training strategy.

wishvivek commented 5 years ago

Yes, any clues on the Seq2Seq+Attention in this network will be great! Please update if anyone gets any solution. Thanks!