Help needed w.r.t. inference

Hi I have few doubts ,

1) Is example1.wav the reference audio file whose style has to be captured while synthesizing samples in inference.ipynb? Do I need to have text and corresponding wav file for inference in mellotron well in advance ?. Usually, I have text which I want to synthesized and reference audio of completely different utterance whose style has to be captured. I am unable to map this with the existing inference.ipynb. Can anyone please give some more clarity on this ?

2) How to run this model as a standalone TTS ?

3) If I have trained my model on single speaker data how can I update Define Speaker Set section in inference.ipynb as it seems it is given for multispeaker only.

NVIDIA / mellotron

Help needed w.r.t. inference #90