NTT123 / light-speed

A modified VITS that utilizes phoneme duration's ground truth for better robustness
MIT License
115 stars 35 forks source link

Which dataset do you use for VN - Male voice? #2

Closed kingkong135 closed 1 year ago

kingkong135 commented 1 year ago

First thanks for this great repo. I have a question. Are you using this viet-tts-dataset ? If so, do you have the preprocessing code before adding it to the training model?

NTT123 commented 1 year ago

Hi, I used the VietBibleVox dataset available at https://huggingface.co/datasets/ntt123/VietBibleVox. For preprocessing steps, please refer to this notebook: https://github.com/NTT123/light-speed/blob/main/prepare_vbx_tfdata.ipynb.

kingkong135 commented 1 year ago

Thank you very much, do you try with VITS 2 like https://github.com/p0p4k/vits2_pytorch ?

UncleBob2 commented 1 year ago

I am a bit confused here.

Your help is greatly appreciated.

NTT123 commented 1 year ago

Hi @UncleBob2,

If I am running the prepare_vbx_tfdata.ipynb, then I don't have to be concerned with the prepare_ljs_tfdata.ipynb, correct?

The VBX notebook is for preprocessing the VietBibleVox (Vietnamese) dataset, while the LJS notebook is for the LJSpeech (English) dataset. If you're focused on using the Vietnamese dataset, then prepare_ljs_tfdata.ipynb is irrelevant.

What files or output should I get when I am training an MFA model, then aligning speech and phonemes (creating a timestamp for each phoneme)? Are these JSON files?

You should expect to see multiple JSON files inside the data/VietBibleVox directory.

I got tfrecord files; however, they are 0 bytes.

This is unexpected. The files should not be empty. There is likely an issue when you ran the following command:

# replace `nproc` with `sysctl -n hw.physicalcpu` if you are using MacOS
!source miniconda/bin/activate aligner; \
mfa train \
    --num_jobs `nproc` \
    --use_mp \
    --clean \
    --overwrite \
    --no_textgrid_cleanup \
    --single_speaker \
    --output_format json \
    --output_directory VietBibleVox \
    VietBibleVox ./lexicon.txt vbx_mfa
UncleBob2 commented 1 year ago

Thanks for your prompt reply. I got it working and got the json files. I am currently running the train.py and it will take some time since my RTX3060 will not be arriving in 5 days. Once the model is trained, is it correct that I can then run the inference.ipynb file? BTW, how many epochs are we running? I notice that the code is set to run for up to 100,000 epochs (for epoch in range(_epoch + 1, 100_000):) Do we have a strategy for early stopping?

FYI, I am a newbie at TTS; hence, please bear with me as I am ramping up my understanding. I am looking for your guidance and hope to contribute to your project.

I can see that the attentions.py, commonys.py, and modules are called from the models.py.

Have a great day.

image