Light Speed ⚡ is an open-source text-to-speech model based on VITS, with some modifications:
We provide two pretrained models and demos:
Q: How do I create training data?
A: See the ./prepare_ljs_tfdata.ipynb
notebook for instructions on preparing the training data.
Q: How can I train the model with 1 GPU?
A: Run: python train.py
Q: How can I train the model with 4 GPUs?
A: Run: torchrun --standalone --nnodes=1 --nproc-per-node=4 train.py
Q: How can I train a model to predict phoneme durations?
A: See the ./train_duration_model.ipynb
notebook.
Q: How can I generate speech with a trained model?
A: See the ./inference.ipynb
notebook.