My Task: Reproducing the results of Wav2Vec2-U for LibriSpeech 960h Corpus:
I have created the train/valid/test.tsv files, for example:
/path/to/data/
3764-168670-0031.wav 131840
8455-210777-0060.wav 113200
7902-96592-0047.wav 74160
237-134500-0009.wav 35520
Next, removed silences using rVADFast and generated train/valid/test.vads files, for example:
5120:16960 17120:67520 71840:123360
3680:43200 47200:56320 57920:108320
3360:52640 59520:72800
1440:33600
Then created the .wrd and .ltr files for all 3, for example:
test.wrd:
WHY I COULD TIE YOU UP IN A KNOT AND HEAVE YOU OFF THE CLIFF ANY DAY WHAT A GAME
test.ltr:
W H Y | I | C O U L D | T I E | Y O U | U P | I N | A | K N O T | A N D | H E A V E | Y O U | O F F | T H E | C L I F F | A N Y | D A Y | W H A T | A | G A M E |
Now, the next step is creating .phn for all 3, for which I need to run the prepare_text.sh file.
Can someone please guide me in running this script, I am confused about what parameters to pass,
it takes: lg=$1, text_path=$2, target_dir=$3, min_phones=$4, phonemizer=$5, lid_path=$6, sil_prob=$7
From my own end, I know lg = en, target_dir = dir to save results, phonemizer = espeak and lid_path = lid.176.bin
I am unsure about the others, can someone please verify my steps until now and guide me in running this prepare_text.sh script?
(If everything is correct, then what do the parameters mean and what to pass in them)
My Task: Reproducing the results of Wav2Vec2-U for LibriSpeech 960h Corpus:
I have created the train/valid/test.tsv files, for example: /path/to/data/ 3764-168670-0031.wav 131840 8455-210777-0060.wav 113200 7902-96592-0047.wav 74160 237-134500-0009.wav 35520
Next, removed silences using rVADFast and generated train/valid/test.vads files, for example: 5120:16960 17120:67520 71840:123360 3680:43200 47200:56320 57920:108320 3360:52640 59520:72800 1440:33600
Then created the .wrd and .ltr files for all 3, for example: test.wrd: WHY I COULD TIE YOU UP IN A KNOT AND HEAVE YOU OFF THE CLIFF ANY DAY WHAT A GAME test.ltr: W H Y | I | C O U L D | T I E | Y O U | U P | I N | A | K N O T | A N D | H E A V E | Y O U | O F F | T H E | C L I F F | A N Y | D A Y | W H A T | A | G A M E |
Now, the next step is creating .phn for all 3, for which I need to run the prepare_text.sh file.
Can someone please guide me in running this script, I am confused about what parameters to pass, it takes: lg=$1, text_path=$2, target_dir=$3, min_phones=$4, phonemizer=$5, lid_path=$6, sil_prob=$7
From my own end, I know lg = en, target_dir = dir to save results, phonemizer = espeak and lid_path = lid.176.bin
I am unsure about the others, can someone please verify my steps until now and guide me in running this prepare_text.sh script? (If everything is correct, then what do the parameters mean and what to pass in them)
Thanks and regards