This is a PyTorch implementation of Microsoft's FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.
Now supporting about 900 speakers in :fire: LibriTTS for multi-speaker text-to-speech.
This project supports 2 muti-speaker datasets:
LibriTTS
VCTK
Configurations are in:
Please modify the dataest and mfa_path in hparams.
In this repo, we're using MFA v1. Migrating to MFA v2 is a TODO item.
[DATASET] / wavs / speaker / wav_files [DATASET] / txts / speaker / txt_files
#run the script for organizing LJSpeech first
python ./script/organizeLJ.py
python preprocess.py /storage/tts2021/LJSpeech-organized/wavs /storage/tts2021/LJSpeech-organized/txts ./processed/LJSpeech --prepare_mfa --mfa --create_dataset
* LibriTTS:
``` shell
python preprocess.py /storage/tts2021//LibriTTS/train-clean-360 /storage/tts2021//LibriTTS/train-clean-360 ./processed/LibriTTS --prepare_mfa --mfa --create_dataset
python preprocess.py /storage/tts2021/VCTK-Corpus/wav48/ /storage/tts2021/VCTK-Corpus/txt ./processed/VCTK --prepare_mfa --mfa --create_dataset
LJSpeech:
python train.py ./processed/LJSpeech --comment "Hello LJSpeech"
LibriTTS:
python train.py ./processed/LibriTTS --comment "Hello LibriTTS"
VCTK:
python train.py ./processed/VCTK --comment "Hello VCTK"
--ckpt_path: the checkpoint path
--output_dir: the directory to put the synthesized audios
python synthesize.py --ckpt_path ./records/LJSpeech_2021-11-22-22:42/ckpt/checkpoint_125000.pth.tar --output_dir ./output