Support for zero shot multi speaker TTS ?

🚀 Feature Description First of all, thank you for all the great work. I am looking for fine-tuning a TTS model for chinese accent for english. I have collected a dataset that consists of 80 hours of speech and annotations from 8 to 10 speakers. I wanted to fine tune a zero shot TTS model (end to end or not). But I cannot find anything in the documentation to do so.

Solution I would really appreciate if someone can point me in the right direction to achieve my task with this repo.

PS - it's a bit urgent

coqui-ai / TTS

Support for zero shot multi speaker TTS ? #1352