π§βπ€ Expressive Text-to-Speech
This is a repository forked from Coqui-AI (πΈTTS ) used to research about expressive TTS in our AI-Unicamp-CPQD group. The original codes are kept in "main" branch which is not our default visualization.
Here we keep the "unicamp' branch as our main branch, while "main" branch remains as the original and updated. You can see here the original README.md.
π About the group
We are an expressive TTS research group located at Unicamp and CPQD (Brazil).
π¨ Implementations
Expressive Models
Expressive Datasets
Style Encoders
- Look-Up
- Reference Encoder (Coarse/Fine-Grained)
- GST
- VAE
- VQ-VAE
- VAE+Flow
- Diffusion
Disentanglement Blocks
- Style Classifier
- Speaker Classifier + GRL (Gradient Reversal Layer)
Style Reference Features
- Pitch
- Energy
- Mel-Spectrogram
Agregation Types
Enhancing Losses
- Orthogonal Loss
- CLIP Loss
- Cycle consistency Loss(*)