For testing purposes, I extracted only 200 files (100 pairs) from the VietBibleVox zip data. I then ran the prepare_vbx_tfdata.ipynb notebook, which resulted in the following:
The JSON files in "./data/VietBibleVox" directory.
The "./data/tfdata/test" directory was created with one file named "part_000.tfrecords" that is approximately 56 MB in size.
The "./data/tfdata/train" directory was created with 256 files named "part_*.tfrecords", but all of them are empty (0 bytes).
The files "lexicon.dict", "lexicon.txt", "phone_set.json", and "vbx_mfa.zip" are non-empty files.
A directory named "MFA" was created in the "$HOME/Documents" directory, with a total size of 86 MB.
Afterwards, I attempted to run "python3 train.py", but the process repeatedly prints "0it [00:00, ?it/s]" to the screen. I waited for approximately 1 hour before interrupting the process. I believe this is an excessively long time for such a small dataset.
For testing purposes, I extracted only 200 files (100 pairs) from the VietBibleVox zip data. I then ran the prepare_vbx_tfdata.ipynb notebook, which resulted in the following:
Afterwards, I attempted to run "python3 train.py", but the process repeatedly prints "0it [00:00, ?it/s]" to the screen. I waited for approximately 1 hour before interrupting the process. I believe this is an excessively long time for such a small dataset.
Since the tfrecords files should not be empty, according to the discussion here: https://github.com/NTT123/light-speed/issues/2#issuecomment-1722147852, I suspect that something went wrong during the preparation process, but I am unable to identify the specific issue.
My equipments: