facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MIT License
30.22k stars 6.38k forks source link

Need help in running prepare_text.sh for Wav2Vec2-U #5459

Open Nemesis-19 opened 6 months ago

Nemesis-19 commented 6 months ago

My Task: Reproducing the results of Wav2Vec2-U for LibriSpeech 960h Corpus:

I have created the train/valid/test.tsv files, for example: /path/to/data/ 3764-168670-0031.wav 131840 8455-210777-0060.wav 113200 7902-96592-0047.wav 74160 237-134500-0009.wav 35520

Next, removed silences using rVADFast and generated train/valid/test.vads files, for example: 5120:16960 17120:67520 71840:123360 3680:43200 47200:56320 57920:108320 3360:52640 59520:72800 1440:33600

Then created the .wrd and .ltr files for all 3, for example: test.wrd: WHY I COULD TIE YOU UP IN A KNOT AND HEAVE YOU OFF THE CLIFF ANY DAY WHAT A GAME test.ltr: W H Y | I | C O U L D | T I E | Y O U | U P | I N | A | K N O T | A N D | H E A V E | Y O U | O F F | T H E | C L I F F | A N Y | D A Y | W H A T | A | G A M E |

Now, the next step is creating .phn for all 3, for which I need to run the prepare_text.sh file.

Can someone please guide me in running this script, I am confused about what parameters to pass, it takes: lg=$1, text_path=$2, target_dir=$3, min_phones=$4, phonemizer=$5, lid_path=$6, sil_prob=$7

From my own end, I know lg = en, target_dir = dir to save results, phonemizer = espeak and lid_path = lid.176.bin

I am unsure about the others, can someone please verify my steps until now and guide me in running this prepare_text.sh script? (If everything is correct, then what do the parameters mean and what to pass in them)

Thanks and regards