dipjyoti92 / SC-WaveRNN

Official PyTorch implementation of Speaker Conditional WaveRNN
109 stars 18 forks source link

need speaker_embedding in gen_wavernn.py #5

Closed Rongjiehuang closed 3 years ago

Rongjiehuang commented 3 years ago

Hi, I have some questions about this work. In function gen_from_file of gen_wavernn.py, we need to input speaker_embedding extracted by wav. But actually, we only get mel-spectrogram for vocoder in TTS system to generate waveforms. Under such circumstances, how can we get speaker_embedding? Thank you.

dipjyoti92 commented 3 years ago

In the zero-shot TTS pipeline, you need to provide a reference speech from which the TTS system will gather the speaker identity and feed it to the system. So, you can get speaker embeddings from the reference speech through the speaker encoder module.

Rongjiehuang commented 3 years ago

I see, thank u.