π Feature Description
First of all, thank you for all the great work.
I am looking for fine-tuning a TTS model for chinese accent for english. I have collected a dataset that consists of 80 hours of speech and annotations from 8 to 10 speakers. I wanted to fine tune a zero shot TTS model (end to end or not). But I cannot find anything in the documentation to do so.
Solution
I would really appreciate if someone can point me in the right direction to achieve my task with this repo.
π Feature Description First of all, thank you for all the great work. I am looking for fine-tuning a TTS model for chinese accent for english. I have collected a dataset that consists of 80 hours of speech and annotations from 8 to 10 speakers. I wanted to fine tune a zero shot TTS model (end to end or not). But I cannot find anything in the documentation to do so.
Solution I would really appreciate if someone can point me in the right direction to achieve my task with this repo.
PS - it's a bit urgent