ga642381/FastSpeech2 - Githubissues

Multi-speaker FastSpeech 2 - PyTorch Implementation :zap:

This is a PyTorch implementation of Microsoft's FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.
Now supporting about 900 speakers in :fire: LibriTTS for multi-speaker text-to-speech.

Datasets :elephant:

This project supports 2 muti-speaker datasets:

:fire: Single-Speaker

LJSpeech

:fire: Multi-Speaker

LibriTTS
VCTK

Config

Configurations are in:

config/dataset.yaml
config/hparams.py

Please modify the dataest and mfa_path in hparams.

In this repo, we're using MFA v1. Migrating to MFA v2 is a TODO item.

Steps

preprocess.py
train.py
synthesize.py

1. Preprocess

File Structures:

[DATASET] / wavs / speaker / wav_files [DATASET] / txts / speaker / txt_files

wav_dir : the folder containing speaker dirs ( [DATASET] / wavs )
txt_dir : the folder containing speaker dirs ( [DATASET] / txts )
save_dir : the output directory (e.g. "./processed" )
--prepare_mfa : create mfa_data
--mfa : create textgrid files
--create_dataset : generate mel, phone, f0 ....., metadata.json

Example commands:

LJSpeech:


#run the script for organizing LJSpeech first
python ./script/organizeLJ.py

python preprocess.py /storage/tts2021/LJSpeech-organized/wavs /storage/tts2021/LJSpeech-organized/txts ./processed/LJSpeech --prepare_mfa --mfa --create_dataset


* LibriTTS:
``` shell 
python preprocess.py /storage/tts2021//LibriTTS/train-clean-360 /storage/tts2021//LibriTTS/train-clean-360 ./processed/LibriTTS --prepare_mfa --mfa --create_dataset

VCTK:

python preprocess.py /storage/tts2021/VCTK-Corpus/wav48/ /storage/tts2021/VCTK-Corpus/txt ./processed/VCTK --prepare_mfa --mfa --create_dataset

metadata.json includes:

spker table
traning data
validation data

2. Train

data_dir : the preprocessed data directory
--comment: some comments

Example commands:

LJSpeech:

python train.py ./processed/LJSpeech --comment "Hello LJSpeech"

LibriTTS:

python train.py ./processed/LibriTTS --comment "Hello LibriTTS"

VCTK:

python train.py ./processed/VCTK --comment "Hello VCTK"

3. Synthesize

--ckpt_path: the checkpoint path
--output_dir: the directory to put the synthesized audios

Example commands:

python synthesize.py --ckpt_path ./records/LJSpeech_2021-11-22-22:42/ckpt/checkpoint_125000.pth.tar --output_dir ./output

ga642381 / FastSpeech2

readme

Multi-speaker FastSpeech 2 - PyTorch Implementation :zap:

Datasets :elephant:

:fire: Single-Speaker

:fire: Multi-Speaker

Config

Steps

1. Preprocess

File Structures:

Example commands:

metadata.json includes:

2. Train

Example commands:

3. Synthesize

Example commands:

References :notebook_with_decorative_cover: