Closed Rongjiehuang closed 3 years ago
In the zero-shot TTS pipeline, you need to provide a reference speech from which the TTS system will gather the speaker identity and feed it to the system. So, you can get speaker embeddings from the reference speech through the speaker encoder module.
I see, thank u.
Hi, I have some questions about this work. In function gen_from_file of gen_wavernn.py, we need to input speaker_embedding extracted by wav. But actually, we only get mel-spectrogram for vocoder in TTS system to generate waveforms. Under such circumstances, how can we get speaker_embedding? Thank you.