Closed Sevan-cpu closed 2 years ago
Can you attach some samples for me to listen?
of course, this one for example test_EMO.zip
I have listened to the samples. The quality is reasonable for Griffin-Lim vocoder. The noises are introduced by the vocoder. If you train a neural vocoder, the nosie will be gone. You can listen to this demo page:https://kunzhou9646.github.io/IS21/. When I was using Griffin-Lim vocoder, the quality is similar with yours. When I used a WaveRNN vocoder, the quality greatly improved.
In the demo page: Speech Quality Test: Seq2seq-EVC-GL: the one uses Griffin-Lim vocoder Seq2seq-EVC-WA1: WaveRNN trained on VCTK Seq2seq-EVC-WA2: WaveRNN trained on VCTK then fine-tuned on ESD
Yes I found it and I will try with WaveRNN now, do you have an example of the implementation of WaveRNN? or I can try to find it myself.
Thanks a lot for the quick response
Yes, for the WaveRNN implementation, I am using this repo: https://github.com/fatchord/WaveRNN; But I am so sorry that my pre-trained models are lost.
I also recommend ParallelWaveGAN: https://github.com/kan-bayashi/ParallelWaveGAN. This is also an excellent neural vocoder. If needed, I can share my pre-trained models. Just let me know.
Yes It would be great if you can share it! I will try the both.
Thanks
Hi, I'm working on your project and I have already did the training for 0013 voice in ESD in fine-tune (300 epochs = checkpoint_10000) and try the inference for it but the result of the convert Neutral to Angry seems very noisy? Is it normal or I have to train the fine_tune for much longer like 10 000 epochs?
Thank you for your work and your help.