I run make_spect.py and make_metadata.py to prepreocess the dataset (I used all speakers in VCTK). And then I used pretrained model of Speaker Encoder to extract speaker embedding and train the model. The final loss is about 0.03. Are there anyone reproduce the result successfully? Could you help me? Thanks!
I run make_spect.py and make_metadata.py to prepreocess the dataset (I used all speakers in VCTK). And then I used pretrained model of Speaker Encoder to extract speaker embedding and train the model. The final loss is about 0.03. Are there anyone reproduce the result successfully? Could you help me? Thanks!