EMOTTS: Multilingual Emotion-Controlled Voice Cloning Text-to-Speech System

Create Env

conda create -n emo python=3.8
conda activate emo

pip install -r requirements.txt
python env.py

Download the model by this link, and then put them into /chinese-roberta-wwm-ext

Collet the data by this

Use this code to complete the following preprocessing:

# The audio path and corresponding text and emotion are stored and divided into training set and validation set.
python getdata.py
python split.py

cd monotonic_align
python setup.py build_ext --inplace
cd ..

python train.py -c path/to/json -m model

python infer.py