Closed terbed closed 3 years ago
Hi @terbed , thanks for creating this issue. We found that the quality of conversion depended on heavily on the pairs of speakers selected. It worked better between speakers of the same gender. However, we saw large improvements over CycleGAN-VC3 and CycleGAN-VC2.
It look's like it stuck's in a local optimum and after that the generator loss will not improve any suggestion for escape local optimum ?
@todalex Maybe cyclical learning rate schedulers? https://pytorch.org/docs/master/generated/torch.optim.lr_scheduler.CyclicLR.html
Thank you @todalex and @terbed for your suggestions. May I ask which speakers both of you are converting between?
i trained a model for conversion between VCC2SF3 and VCC2TF1 and the results were generally comparable to the paper's results after 3500 epochs (~260,000 iterations). They can be seen found: https://github.com/GANtastic3/MaskCycleGAN-VC/tree/main/audio_samples/VCC2SF3_VCC2TF1
The original paper's authors also convert between these speakers http://www.kecl.ntt.co.jp/people/kaneko.takuhiro/projects/maskcyclegan-vc/index.html
There is a possibility that conversion works better between some speakers and this would be very interesting to investigate.
@todalex Since the generator's loss is adversarial, you would expect the loss curve to plateau while the generator and discriminator continue to compete against each other to improve the conversion process. Unfortunately, generator loss convergence is not a good metric to determine when training concludes. Could you try training for longer and sharing your results?
@HikaruHotta dear Hikaruhotta i am using my own dataset for persian speakers and since epoch 700 my g_loss is 7.5 i continued training till epoch 2400 and sadly the generator loss did not changed
@todalex The generator should continue to improve even if the generator loss stays the same at 7.5 or increases. Could you attach an audio sample an audio sample at epoch 700 and 2400 of the ground truth and the converted audio? Could you also attach the training curves for both generators and discriminators?
Hi to all I am also train on my own dataset and result is https://prnt.sc/126uirv
for good results - it necessary that the generator losses be less than 1?
Hi,
First of all, thank you for this nice implementation.
I trained the network with default settings and data (~500k iteration), but the results are really unnaturalistic (eg.: link) and far from the samples provided by the author of the paper: http://www.kecl.ntt.co.jp/people/kaneko.takuhiro/projects/maskcyclegan-vc/index.html
Why is this? Did you experience the same or you got nice results?
In my case, the performance of the neural vocoder (Melgan) limited the synthesized voice. In the article, they might use different melgan weigths which could better capture the speaker voice.
Hi, First of all, thank you for this nice implementation. I trained the network with default settings and data (~500k iteration), but the results are really unnaturalistic (eg.: link) and far from the samples provided by the author of the paper: http://www.kecl.ntt.co.jp/people/kaneko.takuhiro/projects/maskcyclegan-vc/index.html Why is this? Did you experience the same or you got nice results?
In my case, the performance of the neural vocoder (Melgan) limited the synthesized voice. In the article, they might use different melgan weigths which could better capture the speaker voice.
You can use HiFi vocoder its better than MelGan
@pavelxx1 https://prnt.sc/126uirv seems to be a dead link
link is ok check your internet
ps. after 3000 epoch till 5000 I have same plateau g-loss 8.0-8.5 :(
and results is worse after inference (testing)
What is your problem with this? I think it is OK. From 10k iteration there is a drop, because the identity loss part is eliminated (and also lr scheduler started)
Pavel @.***> (időpont: 2021. ápr. 28., Sze, 10:43) ezt írta:
Hi to all I am also train on my own dataset and result is https://prnt.sc/126uirv http://url
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/GANtastic3/MaskCycleGAN-VC/issues/3#issuecomment-828271034, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADJCETJJQMWA7LVAXNUX2W3TK7DDJANCNFSM4247CXTA .
@HikaruHotta dear Hikaru i changed my data into same gender as you said it would have better resualt's but i have the same problem and g_loss is not getting better here is my tensorboard : https://tensorboard.dev/experiment/3QJKZ0ZjQQa0dw5TMoLRbw/ and here is converted output of epoch 2000: https://drive.google.com/drive/folders/1-8xE4AvkjSMr2h3_lPrNToHmtVyiCluN?usp=sharing
@todalex It's great that you're experimenting with new datasets to determine the robustness of MaskCycleGAN-VC.
@todalex The generator should continue to improve even if the generator loss stays the same at 7.5 or increases. Could you attach an audio sample an audio sample at epoch 700 and 2400 of the ground truth and the converted audio? Could you also attach the training curves for both generators and discriminators?
As mentioned above, g_loss does not always monotonically decrease when training GANs because you have two models competing against each other. The optimization problem changes every time the generator or discriminator is updated. Sometimes g_loss goes up and the generator would still improve.
Here is a stackoverflow link on how to interpret GAN losses: https://stackoverflow.com/questions/42690721/how-to-interpret-the-discriminators-loss-and-the-generators-loss-in-generative
I would encourage you to train your model for longer since it's probably still learning. Your model is definitely not suffering from mode collapse. Do you have audio samples from the real dataset? It could be helpful to diagnose what could be going wrong.
@HikaruHotta thank you for the link about GAN losses it was very good for me this is my dataset of same gender (male) and i split into two parts train/s1 eval/s1 and train/s2 eval/s2: 4/1AY0e-g6GjgqzHZ_j0mMsKusCQ5fcite0EbMNlN26v3D1gLnMx2TVnTOOczs
as you said I will continue training because the d_loss is Descending but i think the model has learned well enough and reached to some optimum but the output is no good how can i change the learning rate and reach to a good learning rate for my data ?so i try training with new learning rate again and see if the resault is any better
@HikaruHotta, If I want use Mel_gan vocoder for non English dataset Must I train MelGan vocoder from scratch on my dataset also? thx
@HikaruHotta thank you for the link about GAN losses it was very good for me this is my dataset of same gender (male) and i split into two parts train/s1 eval/s1 and train/s2 eval/s2: 4/1AY0e-g6GjgqzHZ_j0mMsKusCQ5fcite0EbMNlN26v3D1gLnMx2TVnTOOczs
as you said I will continue training because the d_loss is Descending but i think the model has learned well enough and reached to some optimum but the output is no good how can i change the learning rate and reach to a good learning rate for my data ?so i try training with new learning rate again and see if the resault is any better
You can modify the --lr (learning rate) argument as shown here: https://github.com/GANtastic3/MaskCycleGAN-VC#training
@HikaruHotta, If I want use Mel_gan vocoder for non English dataset Must I train MelGan vocoder from scratch on my dataset also? thx
@pavelxx1 Seeing that the melGAN vocoder was trained on https://keithito.com/LJ-Speech-Dataset/, I would expect that it doesn't model the complexities of other languages. I suggest that you take a look at Universal MelGAN which seems to work across multiple languages.
thx but if I want use MelGan for my lang(UA) - I must train a vocoder also?
No, the vocoder is language independent, I am using it for Hungarian language, and works quite well.
On 2021. May 3., at 0:15, Pavel @.***> wrote:
@HikaruHotta https://github.com/HikaruHotta, If I want use Mel_gan vocoder for non English dataset Must I train MelGan vocoder from scratch on my dataset also? thx
@pavelxx1 https://github.com/pavelxx1 Seeing that the melGAN vocoder was trained on https://keithito.com/LJ-Speech-Dataset/ https://keithito.com/LJ-Speech-Dataset/, I would expect that it doesn't model the complexities of other languages. I suggest that you take a look at Universal MelGAN https://kallavinka8045.github.io/icassp2021/ which seems to work across multiple languages.
thx but if I want use MelGan for my lang(UA) - I must train a vocoder also?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/GANtastic3/MaskCycleGAN-VC/issues/3#issuecomment-830915341, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADJCETN5MNUQNYMAW6DBP73TLXFG5ANCNFSM4247CXTA.
My performance issue on male speaker is solved with #7 , the bad performance can be attributed to the fact that the spectrogram was scaled with the female statistics.
@HikaruHotta what I meant is pleas help me find a good lr number I don't know I should change it to what number?current is 5e-4 .can i change the lr in the 3000 epoch for example to escape the local optimum?
Hi,
First of all, thank you for this nice implementation.
I trained the network with default settings and data (~500k iteration), but the results are really unnaturalistic (eg.: link) and far from the samples provided by the author of the paper: http://www.kecl.ntt.co.jp/people/kaneko.takuhiro/projects/maskcyclegan-vc/index.html
Why is this? Did you experience the same or you got nice results?