1) Is example1.wav the reference audio file whose style has to be captured while synthesizing samples in inference.ipynb? Do I need to have text and corresponding wav file for inference in mellotron well in advance ?. Usually, I have text which I want to synthesized and reference audio of completely different utterance whose style has to be captured. I am unable to map this with the existing inference.ipynb. Can anyone please give some more clarity on this ?
2) How to run this model as a standalone TTS ?
3) If I have trained my model on single speaker data how can I update Define Speaker Set section in inference.ipynb as it seems it is given for multispeaker only.
Hi I have few doubts ,
1) Is example1.wav the reference audio file whose style has to be captured while synthesizing samples in inference.ipynb? Do I need to have text and corresponding wav file for inference in mellotron well in advance ?. Usually, I have text which I want to synthesized and reference audio of completely different utterance whose style has to be captured. I am unable to map this with the existing inference.ipynb. Can anyone please give some more clarity on this ?
2) How to run this model as a standalone TTS ?
3) If I have trained my model on single speaker data how can I update Define Speaker Set section in inference.ipynb as it seems it is given for multispeaker only.