Closed kingkong135 closed 1 year ago
Hi, I used the VietBibleVox dataset available at https://huggingface.co/datasets/ntt123/VietBibleVox. For preprocessing steps, please refer to this notebook: https://github.com/NTT123/light-speed/blob/main/prepare_vbx_tfdata.ipynb.
Thank you very much, do you try with VITS 2 like https://github.com/p0p4k/vits2_pytorch ?
I am a bit confused here.
If I am running the prepare_vbx_tfdata.ipynb then I don't have to be concerned with the prepare_ljs_tfdata.ipynb correct?
what files or output should I get when I am Train an MFA model, then align speech and phonemes (creating a timestamp for each phoneme). Are these json files?
I got tfrecord files; however, they are 0 bytes
Your help is greatly appreciated.
Hi @UncleBob2,
If I am running the prepare_vbx_tfdata.ipynb, then I don't have to be concerned with the prepare_ljs_tfdata.ipynb, correct?
The VBX notebook is for preprocessing the VietBibleVox (Vietnamese) dataset, while the LJS notebook is for the LJSpeech (English) dataset. If you're focused on using the Vietnamese dataset, then prepare_ljs_tfdata.ipynb is irrelevant.
What files or output should I get when I am training an MFA model, then aligning speech and phonemes (creating a timestamp for each phoneme)? Are these JSON files?
You should expect to see multiple JSON files inside the data/VietBibleVox
directory.
I got tfrecord files; however, they are 0 bytes.
This is unexpected. The files should not be empty. There is likely an issue when you ran the following command:
# replace `nproc` with `sysctl -n hw.physicalcpu` if you are using MacOS
!source miniconda/bin/activate aligner; \
mfa train \
--num_jobs `nproc` \
--use_mp \
--clean \
--overwrite \
--no_textgrid_cleanup \
--single_speaker \
--output_format json \
--output_directory VietBibleVox \
VietBibleVox ./lexicon.txt vbx_mfa
Thanks for your prompt reply. I got it working and got the json files. I am currently running the train.py and it will take some time since my RTX3060 will not be arriving in 5 days. Once the model is trained, is it correct that I can then run the inference.ipynb file? BTW, how many epochs are we running? I notice that the code is set to run for up to 100,000 epochs (for epoch in range(_epoch + 1, 100_000):) Do we have a strategy for early stopping?
FYI, I am a newbie at TTS; hence, please bear with me as I am ramping up my understanding. I am looking for your guidance and hope to contribute to your project.
I can see that the attentions.py, commonys.py, and modules are called from the models.py.
Have a great day.
First thanks for this great repo. I have a question. Are you using this viet-tts-dataset ? If so, do you have the preprocessing code before adding it to the training model?