I am trying to reproduce the results of the Language Identification task with the XLS-R model on the Voxligua107 dataset, but following the current instructions yields several errors.
More specifically, I can't run the first command, which according to the instructions, is the following:
For starters, the path to gen_audio_embedding.py should be examples/wav2vec/xlsr/scripts/gen_audio_embedding.py (and not examples/wav2vec/gen_audio_embedding.py).
🐛 Bug
I am trying to reproduce the results of the Language Identification task with the XLS-R model on the Voxligua107 dataset, but following the current instructions yields several errors.
More specifically, I can't run the first command, which according to the instructions, is the following:
For starters, the path to
gen_audio_embedding.py
should beexamples/wav2vec/xlsr/scripts/gen_audio_embedding.py
(and notexamples/wav2vec/gen_audio_embedding.py
).Then, it seems like the
audio_classification
task no longer exists, so the script fails in this line: https://github.com/facebookresearch/fairseq/blob/main/examples/wav2vec/xlsr/scripts/gen_audio_embedding.py#L27We can update the line to the following, but not sure if this is correct:
After that, the task in the command line has also to change, and I changed it to
audio_finetuning
(but again, not sure if this is right).After these changes, I still can't run the code, since it yields the following error:
Additionally, it is not clear where to obtain the
manifest
or thetest.tsv
files from the VoxLingua107 dataset. Could you please clarify?Thanks!
To Reproduce
Steps to reproduce the behavior:
Expected behavior
The logits/embeddings from the XLSR model for the VoxLingua107 dataset should be extracted and put them into
/tmp/tmp_voxling_infer.npz
.Environment
pip
, source): pip