CODEJIN / NaturalSpeech2

MIT License
140 stars 15 forks source link

NaturalSpeech 2

Shen, K., Ju, Z., Tan, X., Liu, Y., Leng, Y., He, L., ... & Bian, J. (2023). NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers. arXiv preprint arXiv:2304.09116.

Modifications from Paper

Supported dataset

Hyper parameters

Before proceeding, please set the pattern, inference, and checkpoint paths in Hyper_Parameters.yaml according to your environment.

Generate pattern

python Pattern_Generate.py [parameters]

Parameters

About phonemizer

Command

Single GPU

python Train.py -hp <path> -s <int>

Multi GPU

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 OMP_NUM_THREADS=32 python -m torch.distributed.launch --nproc_per_node=8 Train.py --hyper_parameters Hyper_Parameters.yaml --port 54322

Checkpoint

Dataset SR Link
VCTK 22050 Google drive