LuckyBian / EMOTTS

This is a TTS model based on VITS that can control the output speech emotion through natural language and control the speaker through reference audio.
4 stars 1 forks source link