espeak phoneme tokenization - failed experiment?

daniilrobnikov / vits2

VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design

https://vits-2.github.io/demo/

MIT License

491 stars 54 forks source link

Open Teravus opened 9 months ago

Teravus commented 9 months ago

Hey there

I trained a model to 42,000 steps on master. And, it sounds like the voice that I trained it on but.. the phonemes sound like eSpeak-EN-US.

Just wondering if I should give it more time.. or go back a revision with the vocab as a static text that doesn't use espeak.