How many epoch for fine_tune training is necessary for 0013 voice? - Githubissues

KunZhou9646 / seq2seq-EVC

This is the implementation of our Interspeech 2021 paper: Limited data emotional voice conversion leveraging text-to-speech: two-stage sequence-to-sequence training.

83 stars 16 forks source link

How many epoch for fine_tune training is necessary for 0013 voice? #6

Closed Sevan-cpu closed 2 years ago

Sevan-cpu commented 2 years ago

Hi, I'm working on your project and I have already did the training for 0013 voice in ESD in fine-tune (300 epochs = checkpoint_10000) and try the inference for it but the result of the convert Neutral to Angry seems very noisy? Is it normal or I have to train the fine_tune for much longer like 10 000 epochs?

Thank you for your work and your help.

KunZhou9646 commented 2 years ago

Can you attach some samples for me to listen?

Sevan-cpu commented 2 years ago

of course, this one for example test_EMO.zip

KunZhou9646 commented 2 years ago

I have listened to the samples. The quality is reasonable for Griffin-Lim vocoder. The noises are introduced by the vocoder. If you train a neural vocoder, the nosie will be gone. You can listen to this demo page:https://kunzhou9646.github.io/IS21/. When I was using Griffin-Lim vocoder, the quality is similar with yours. When I used a WaveRNN vocoder, the quality greatly improved.

KunZhou9646 commented 2 years ago

In the demo page: Speech Quality Test: Seq2seq-EVC-GL: the one uses Griffin-Lim vocoder Seq2seq-EVC-WA1: WaveRNN trained on VCTK Seq2seq-EVC-WA2: WaveRNN trained on VCTK then fine-tuned on ESD

Sevan-cpu commented 2 years ago

Yes I found it and I will try with WaveRNN now, do you have an example of the implementation of WaveRNN? or I can try to find it myself.

Thanks a lot for the quick response

KunZhou9646 commented 2 years ago

Yes, for the WaveRNN implementation, I am using this repo: https://github.com/fatchord/WaveRNN; But I am so sorry that my pre-trained models are lost.

I also recommend ParallelWaveGAN: https://github.com/kan-bayashi/ParallelWaveGAN. This is also an excellent neural vocoder. If needed, I can share my pre-trained models. Just let me know.

Sevan-cpu commented 2 years ago

Yes It would be great if you can share it! I will try the both.

Thanks

KunZhou9646 commented 2 years ago

Parallel WaveGAN: https://drive.google.com/file/d/1n01r7p-XALB6jcI8QGPfX4o9C7JRo6kX/view?usp=sharing