NTT123 / light-speed

A modified VITS that utilizes phoneme duration's ground truth for better robustness
MIT License
115 stars 35 forks source link

Training took forever to finish #4

Open nganhtua opened 1 year ago

nganhtua commented 1 year ago

For testing purposes, I extracted only 200 files (100 pairs) from the VietBibleVox zip data. I then ran the prepare_vbx_tfdata.ipynb notebook, which resulted in the following:

Afterwards, I attempted to run "python3 train.py", but the process repeatedly prints "0it [00:00, ?it/s]" to the screen. I waited for approximately 1 hour before interrupting the process. I believe this is an excessively long time for such a small dataset.

Since the tfrecords files should not be empty, according to the discussion here: https://github.com/NTT123/light-speed/issues/2#issuecomment-1722147852, I suspect that something went wrong during the preparation process, but I am unable to identify the specific issue.

My equipments: