Closed erogol closed 2 years ago
Even though they mention only WaveGlow in their paper, from here https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/tts_en_fastpitch it's evident that HifiGan can be used as a vocoder.
I'm sorry I could be stupid but here https://github.com/coqui-ai/TTS/blob/0c2150a6c10060c9427d7940bf845d61b88a7c09/TTS/vocoder/README.md I don't see instruction of how a vocoder is tied anyhow to a TTS model. How can one train a 'designated' vocoder model for a TTS model?
Btw, hifigan_v2 vocoder that's used in the project is just 3,5 MB. Is this ok @Edresson
"vctk": {
"hifigan_v2": {
"description": "Finetuned and intended to be used with tts_models/en/vctk/sc-glow-tts",
"github_rls_url": "https://coqui.gateway.scarf.sh/v0.0.12/vocoder_model--en--vctk--hifigan_v2.zip",
"commit": "2f07160",
"author": "Edresson Casanova",
"license": "",
"contact": ""
}
},
It has only the generator network.
Even though they mention only WaveGlow in their paper, from here https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/tts_en_fastpitch it's evident that HifiGan can be used as a vocoder.
I'm sorry I could be stupid but here https://github.com/coqui-ai/TTS/blob/0c2150a6c10060c9427d7940bf845d61b88a7c09/TTS/vocoder/README.md I don't see instruction of how a vocoder is tied anyhow to a TTS model. How can one train a 'designated' vocoder model for a TTS model?
train the vocoder on the same dataset using the same audio parameters
God knows I tried, but distributed training doesn't work at all, single GPU training doesn't improve.
I'll roll-back to CoquiTTS 3.x to see if it makes any changes in HiFiGan training.
No improvement.
@erogol no help here?
you don't need to call my handle. It is not a support channel in the end.
how should I help you just by looking at your tensorboard?
It is also not relevant to the issue. Please create a new thread.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.
The vocoder model designated for the VCTK FastPitch model is not compatible and it produces pure noise.
We need to train a new compatible vocoder or update the FastPitch model.
Until then recommended to use Griffin-Lim vocoder by passing empty vocoder name or setting the field in
.models.json
to None