I'm trying to train VCTK data for 24khz from scratch for hier quality.
I set SAMPLE_RATE = 24000 in recipes/vctk/yourtts/train_yourtts.py.
However, it loads AudioProcessor in 16k and computes speaker embeddings in 16k.
Where is this 16k sample rate hard coded?
Also, do I need to train vocoder first before training tts, or tts training does not rely on vocoder?
Thanks!
I'm trying to train VCTK data for 24khz from scratch for hier quality. I set SAMPLE_RATE = 24000 in recipes/vctk/yourtts/train_yourtts.py. However, it loads AudioProcessor in 16k and computes speaker embeddings in 16k. Where is this 16k sample rate hard coded? Also, do I need to train vocoder first before training tts, or tts training does not rely on vocoder? Thanks!