NVIDIA / flowtron

Flowtron is an auto-regressive flow-based generative network for text to speech synthesis with control over speech variation and style transfer
https://nv-adlr.github.io/Flowtron
Apache License 2.0
887 stars 177 forks source link

Change inference speaker to custom dataset speaker #132

Open urpeter opened 2 years ago

urpeter commented 2 years ago

Hello, I fine-tuned the libritts2k model on some custom data (roughly 15 minutes of speech) of mine. The output with the inference demo is pretty good, though it doesn't sound like the custom data voice. Do I have to fine-tune the model longer? The best results are typically after 5000 iterations or do I have to change some code in the inference.py file. Or do I have a grave misunderstanding on how to produce a custom dataset voice? Any advice would be welcome, thank you.

Alexey322 commented 2 years ago

@urpeter How did you fine-tune for few-shot synthesis if the libritts2k checkpoint does not contain layers for the current model?

letrongan commented 2 years ago

@urpeter ur Can you share your config ? I have some problem when fine-tune my model. Thanks