CorentinJ / Real-Time-Voice-Cloning

Clone a voice in 5 seconds to generate arbitrary speech in real-time
Other
51.65k stars 8.66k forks source link

Quick question about the Synthesizer. #1155

Open prakharpbuf opened 1 year ago

prakharpbuf commented 1 year ago

Hi! I watched the video demonstration from the README.md file. In that video it is mentioned that the synthesizer will generate a mel spectrogram for the input text using the given embedding and clicking on "Synthesize only" multiple times will generate slightly different speech. So my question is: Does clicking "Synthesize only" multiple times and then vocoding generate a better result? I tried to find the answer in GitHub issues but couldn't find anything. I did however learn that loading multiple utterances from the same speaker does not improve the quality of output because the output is generated from only one embedding as its reference.