coqui-ai / TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
http://coqui.ai
Mozilla Public License 2.0
35.25k stars 4.3k forks source link

[Bug] VCTK Fast Pitch model is not compatible with the vocoder #907

Closed erogol closed 2 years ago

erogol commented 3 years ago

The vocoder model designated for the VCTK FastPitch model is not compatible and it produces pure noise.

We need to train a new compatible vocoder or update the FastPitch model.

Until then recommended to use Griffin-Lim vocoder by passing empty vocoder name or setting the field in .models.json to None

skol101 commented 2 years ago

Even though they mention only WaveGlow in their paper, from here https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/tts_en_fastpitch it's evident that HifiGan can be used as a vocoder.

I'm sorry I could be stupid but here https://github.com/coqui-ai/TTS/blob/0c2150a6c10060c9427d7940bf845d61b88a7c09/TTS/vocoder/README.md I don't see instruction of how a vocoder is tied anyhow to a TTS model. How can one train a 'designated' vocoder model for a TTS model?

skol101 commented 2 years ago

Btw, hifigan_v2 vocoder that's used in the project is just 3,5 MB. Is this ok @Edresson

  "vctk": {
                "hifigan_v2": {
                    "description": "Finetuned and intended to be used with tts_models/en/vctk/sc-glow-tts",
                    "github_rls_url": "https://coqui.gateway.scarf.sh/v0.0.12/vocoder_model--en--vctk--hifigan_v2.zip",
                    "commit": "2f07160",
                    "author": "Edresson Casanova",
                    "license": "",
                    "contact": ""
                }
            },
erogol commented 2 years ago

It has only the generator network.

erogol commented 2 years ago

Even though they mention only WaveGlow in their paper, from here https://catalog.ngc.nvidia.com/orgs/nvidia/teams/nemo/models/tts_en_fastpitch it's evident that HifiGan can be used as a vocoder.

I'm sorry I could be stupid but here https://github.com/coqui-ai/TTS/blob/0c2150a6c10060c9427d7940bf845d61b88a7c09/TTS/vocoder/README.md I don't see instruction of how a vocoder is tied anyhow to a TTS model. How can one train a 'designated' vocoder model for a TTS model?

train the vocoder on the same dataset using the same audio parameters

skol101 commented 2 years ago

God knows I tried, but distributed training doesn't work at all, single GPU training doesn't improve. Screenshot from 2021-11-30 17-21-47

skol101 commented 2 years ago

I'll roll-back to CoquiTTS 3.x to see if it makes any changes in HiFiGan training.

skol101 commented 2 years ago

No improvement. Screenshot from 2021-12-03 13-05-18

skol101 commented 2 years ago

@erogol no help here?

erogol commented 2 years ago

you don't need to call my handle. It is not a support channel in the end.

how should I help you just by looking at your tensorboard?

It is also not relevant to the issue. Please create a new thread.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. You might also look our discussion channels.