Open kannadaraj opened 4 years ago
The pitch contour should definitely be applied. Can you share mel-spectrograms, alignments and your pitch contour. What code are you using for running inference?
@rafaelvalle : thanks a lot for replying.
Here i am using a style file for mimic the style. the style file is not same as that of the synthesizing sentence. Here i am using GST only mode using
mel_outputs, mel_outputs_postnet, gate_outputs, _ = mellotron.inference( (text_encoded, mel, speaker_id, pitch_contour))
input text is "I am spending time with the family." the style file is an highly emotive sentence (happy). But i see jsut normal synthesis. It doesnt apply any the style of the file. Please can you help
Can you pull from master and try again?
@rafaelvalle thanks fro the update.. I am retraining with your length inclusion modification. Will keep you posted 👍
Hi. I have trained a mellotron model with a single speaker data with multiple styles of speaking, like a story audio book. It has high degree of intonation and pitch variation in the data. Total duration is about 19 hours. Training goes well and curves and alignment also looks good.
But during inference, when i try to use a stylefile to impart style it doesn't apply anything. I will be synthesized as normally as if no style is available. I tried both variation like simple GST and GST+f0+pitch variation which is inference and inference_noattn. Neiter the style or the pitch, f0 variation is applied. the duration of the audio is similar to that of the synthesized audio..
Please can you suggest what might be the problem. Thanks.