NVIDIA / waveglow

A Flow-based Generative Network for Speech Synthesis
BSD 3-Clause "New" or "Revised" License
2.27k stars 531 forks source link

Why data female voice but got result is male voice? #150

Closed chazo1994 closed 4 years ago

chazo1994 commented 5 years ago

I have trained End to End model with tacotron2 and waveglow. I used 25 hours of female voice for train both tacotron2 model and waveglow model. The sampling rate of training data is 22050 hz. At the checkpoint_87000(epoch 398) of tacotron2 and at waveglow_98000(epoch20) I have got result voice is male while training voice is female. Could you please help me to explain this problems and fix it.

Thank you so much!

tianrengao commented 5 years ago

Did you feed in a male's mel-spec? The data you used to train is female, however, if you use male's mel-spec to do inference, you will get a male's voice.

enamoria commented 5 years ago

I have trained End to End model with tacotron2 and waveglow. I used 25 hours of female voice for train both tacotron2 model and waveglow model. The sampling rate of training data is 22050 hz. At the checkpoint_87000(epoch 398) of tacotron2 and at waveglow_98000(epoch20) I have got result voice is male while training voice is female. Could you please help me to explain this problems and fix it.

Thank you so much!

Check for melspectrogram feature extraction of tacotron and waveglow. These must match. You can also check for your tacotron quality by using Griffin Lim to generate wav from your predicted mels. By doing this you can tell which one is causing problem, tacotron or waveglow

rafaelvalle commented 4 years ago

Closing due to inactivity.

Ashbajawed commented 4 years ago

Did you feed in a male's mel-spec? The data you used to train is female, however, if you use male's mel-spec to do inference, you will get a male's voice.

hello @tianrengao i want to generate male voice so i just need to change mel-spec? no male dataset required?

also any guide from where i can get male's mel-spec