Closed TechInterMezzo closed 1 year ago
The biggest problem right now is the data. For English we have a couple of good datasets, but for German, all of the open datasets are somewhat flawed. Depending on the speaker embedding used you can however get pretty decent German speech out of the model.
If you want to train something yourself, I give the same recommendation as always: If you have more than 5 hours of high quality data, train from scratch. If you have less than that, finetune from the pretrained model.
I tried run_interactive_demo.py with LANGUAGE set to de and it sounds like any old TTS. It sounds a bit robotic, monotone and not natural. Can this be improved with further training? Should look for a better German dataset? Should I train from scratch or finetune the existing model?