NVIDIA / mellotron

Mellotron: a multispeaker voice synthesis model based on Tacotron 2 GST that can make a voice emote and sing without emotive or singing training data
BSD 3-Clause "New" or "Revised" License
854 stars 187 forks source link

Style not being applied #64

Open kannadaraj opened 4 years ago

kannadaraj commented 4 years ago

Hi. I have trained a mellotron model with a single speaker data with multiple styles of speaking, like a story audio book. It has high degree of intonation and pitch variation in the data. Total duration is about 19 hours. Training goes well and curves and alignment also looks good.

But during inference, when i try to use a stylefile to impart style it doesn't apply anything. I will be synthesized as normally as if no style is available. I tried both variation like simple GST and GST+f0+pitch variation which is inference and inference_noattn. Neiter the style or the pitch, f0 variation is applied. the duration of the audio is similar to that of the synthesized audio..

Please can you suggest what might be the problem. Thanks.

rafaelvalle commented 4 years ago

The pitch contour should definitely be applied. Can you share mel-spectrograms, alignments and your pitch contour. What code are you using for running inference?

kannadaraj commented 4 years ago

@rafaelvalle : thanks a lot for replying.

Here i am using a style file for mimic the style. the style file is not same as that of the synthesizing sentence. Here i am using GST only mode using

mel_outputs, mel_outputs_postnet, gate_outputs, _ = mellotron.inference( (text_encoded, mel, speaker_id, pitch_contour))

input text is "I am spending time with the family." the style file is an highly emotive sentence (happy). But i see jsut normal synthesis. It doesnt apply any the style of the file. Please can you help

plot_mel_f0_alignment

rafaelvalle commented 4 years ago

Can you pull from master and try again?

kannadaraj commented 4 years ago

@rafaelvalle thanks fro the update.. I am retraining with your length inclusion modification. Will keep you posted 👍