Hi!
I watched the video demonstration from the README.md file. In that video it is mentioned that the synthesizer will generate a mel spectrogram for the input text using the given embedding and clicking on "Synthesize only" multiple times will generate slightly different speech. So my question is:
Does clicking "Synthesize only" multiple times and then vocoding generate a better result?
I tried to find the answer in GitHub issues but couldn't find anything. I did however learn that loading multiple utterances from the same speaker does not improve the quality of output because the output is generated from only one embedding as its reference.
Hi! I watched the video demonstration from the README.md file. In that video it is mentioned that the synthesizer will generate a mel spectrogram for the input text using the given embedding and clicking on "Synthesize only" multiple times will generate slightly different speech. So my question is: Does clicking "Synthesize only" multiple times and then vocoding generate a better result? I tried to find the answer in GitHub issues but couldn't find anything. I did however learn that loading multiple utterances from the same speaker does not improve the quality of output because the output is generated from only one embedding as its reference.