CODEJIN / Glow_TTS

An implement of GlowTTS model. Several modes are added: speaker embedding, prosody encoder(GST), and gradient reversal.
MIT License
52 stars 12 forks source link
global-style-tokens glow-tts lut

Multispeaker GlowTTS

Requirements

Structure

Vanilla mode (Single speaker GlowTTS)

### Training ### Inference

Speaker embedding mode

### Training ### Inference

Prosody encoding mode (GST GlowTTS)

### Training ### Inference

Gradient reversal mode (Voice cloning GlowTTS - Failed)

### Training ### Inference

Used dataset

Single Multi Dataset Dataset address
O O LJSpeech https://keithito.com/LJ-Speech-Dataset/
X X BC2013 http://www.cstr.ed.ac.uk/projects/blizzard/
X O CMU Arctic http://www.festvox.org/cmu_arctic/index.html
X O VCTK https://datashare.is.ed.ac.uk/handle/10283/2651
X X LibriTTS https://openslr.org/60/

Hyper parameters

Before proceeding, please set the pattern, inference, and checkpoint paths in 'Hyper_Parameters.yaml' according to your environment.

Generate pattern

Command

python Pattern_Generate.py [parameters]

Parameters

At least, one or more of datasets must be used.

Run

Command

python Train.py -s <int>

Inference

Result

Please see at the demo site

Trained checkpoint

Mode Dataset Trained steps Link
Vanilla LJ 100000 Link(Broken)
SE & LUT LJ + CUMA 100000 Link
SE & LUT LJ + VCTK 100000 Link
PE LJ + CUMA 100000 Link
PE LJ + VCTK 400000 Link
GR & LUT LJ + VCTK 400000 Link(Failed)

Future works