❓ Questions and Help

Hi! Thank you for releasing the Wav2vec 2.0 code.
I'm trying to deduce with the Wav2vec pre-train model, but there are some problems.

This is the command I made for setting up the environment, I would appreciate it if you could let me know if there is anything wrong with it, and I think it would be good for people who want to install wav2 letter to refer to this.

Install wav2letter

# Install torchaudio & sentencepiece

pip install torchaudio
pip install sentencepiece
pip install soundfile

# Update apt-get & Install soundfile

apt-get update \
&& apt-get upgrade -y \
&& apt-get install -y \
&& apt-get -y install apt-utils gcc libpq-dev libsndfile-dev

# Install kenlm

sudo apt install build-essential cmake libboost-system-dev libboost-thread-dev libboost-program-options-dev libboost-test-dev libeigen3-dev zlib1g-dev libbz2-dev liblzma-dev
git clone https://github.com/kpu/kenlm.git
cd kenlm
mkdir -p build
cd build
cmake .. -DCMAKE_BUILD_TYPE=Release -DKENLM_MAX_ORDER=20 -DCMAKE_POSITION_INDEPENDENT_CODE=ON
make -j 16
export KENLM_ROOT_DIR=/root/kenlm/
cd ../..

# Install Additional Dependencies (ATLAS, OpenBLAS, Accelerate, Intel MKL)

apt-get install libsndfile1-dev libopenblas-dev libfftw3-dev libgflags-dev libgoogle-glog-dev

# Install wav2letter

git clone -b v0.2 https://github.com/facebookresearch/wav2letter.git
cd wav2letter/bindings/python
pip install -e .
cd ../../..

After install wav2letter, I install fairseq by git clone.

And while trying many things to deduce, I found and downloaded links to the various files I needed.

letter vocabulary, lexicon

I use the below command to decode by Viterbi

python examples/speech_recognition/infer.py /path/to/manifest/ --task audio_pretraining --nbest 1 --path /path/to/model --gen-subset dev_clean --results-path /path/to/results --w2l-decoder viterbi --word-score -1 --sil-weight 0 --criterion ctc --labels ltr --max-tokens 4000000 --post-process letter

But I encounter this error message

INFO:fairseq.data.audio.raw_audio_dataset:loaded 2620, skipped 0 samples
INFO:__main__:| . test-clean 2620 examples
INFO:__main__:| decoding with criterion ctc
INFO:__main__:| loading model(s) from /data/project/rw/kaki/wav2vec/wav2vec2_vox_960h.pt
/root/fairseq/examples/speech_recognition/w2l_decoder.py:39: UserWarning: wav2letter python bindingbindings
  "wav2letter python bindings are required to use this functionality. Please install from https://g
Traceback (most recent call last):
  File "examples/speech_recognition/infer.py", line 429, in <module>
    cli_main()
  File "examples/speech_recognition/infer.py", line 425, in cli_main
    main(args)
  File "examples/speech_recognition/infer.py", line 336, in main
    for sample in t:
  File "/opt/conda/lib/python3.7/site-packages/tqdm/std.py", line 1129, in __iter__
    for obj in iterable:
  File "/root/fairseq/fairseq/data/iterators.py", line 60, in __iter__
    for x in self.iterable:
  File "/root/fairseq/fairseq/data/iterators.py", line 546, in __next__
    raise item
  File "/root/fairseq/fairseq/data/iterators.py", line 478, in run
    for item in self._source:
  File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 363, in __next
    data = self._next_data()
  File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 989, in _next_
    return self._process_data(data)
  File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 1014, in _proc
    data.reraise()
  File "/opt/conda/lib/python3.7/site-packages/torch/_utils.py", line 395, in reraise
    raise self.exc_type(msg)
IndexError: Caught IndexError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/_utils/worker.py", line 185, in _wo
    data = fetcher.fetch(index)
  File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/opt/conda/lib/python3.7/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <list
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/root/fairseq/fairseq/data/add_target_dataset.py", line 27, in __getitem__
    item["label"] = self.get_label(index)
  File "/root/fairseq/fairseq/data/add_target_dataset.py", line 23, in get_label
    return self.labels[index] if self.process_label is None else self.process_label(self.labels[ind
IndexError: list index out of range

I am embarrassed that this error occurred even though Ken-LM or Transformer LM was not used.
The part I suspect, If I want to refer to test-clean, test-clean.ltr needed.

But I didn't know where it came from, so I just changed the name of dict.ltr.txt. (download from letter vocabulary)

I looked at various issues and README, but I still don't know what the problem is. I think it will be a lot of help if you help me. Thank You !!

What's your environment?

fairseq Version (e.g., 1.0 or master): git clone today
PyTorch Version (e.g., 1.0) : 1.6.0
OS (e.g., Linux): Ubuntu
How you installed fairseq (pip, source): git clone today
Build command you used (if compiling from source):
Python version:
CUDA/cuDNN version:
GPU models and configuration:
Any other relevant information:

Never mind, I solved it.

Instal Requirements


# Install python libraries
pip install soundfile
pip install torchaudio
pip install sentencepiece

Update apt-get & Install soundfile

apt-get update \ && apt-get upgrade -y \ && apt-get install -y \ && apt-get -y install apt-utils gcc libpq-dev libsndfile-dev

Install kenlm

sudo apt install build-essential cmake libboost-system-dev libboost-thread-dev libboost-program-options-dev libboost-test-dev libeigen3-dev zlib1g-dev libbz2-dev liblzma-dev git clone https://github.com/kpu/kenlm.git cd kenlm mkdir -p build cd build cmake .. -DCMAKE_BUILD_TYPE=Release -DKENLM_MAX_ORDER=20 -DCMAKE_POSITION_INDEPENDENT_CODE=ON make -j 16 export KENLM_ROOT_DIR=/root/kenlm/ cd ../..

Install Additional Dependencies (ATLAS, OpenBLAS, Accelerate, Intel MKL)

apt-get install libsndfile1-dev libopenblas-dev libfftw3-dev libgflags-dev libgoogle-glog-dev

Install wav2letter

git clone -b v0.2 https://github.com/facebookresearch/wav2letter.git cd wav2letter/bindings/python pip install -e . cd ../../..


* Run Viterbi

DATASET_PATH=$1 TESTSET=$2 EXT=$3 # flac or wav MODEL_PATH=$4

testset_path=$1$2 tsv_path="./manifest/$2.tsv"

Install dict.ltr.txt

wget https://dl.fbaipublicfiles.com/fairseq/wav2vec/dict.ltr.txt

Prepare Inference

python examples/wav2vec/wav2vec_manifest.py $testset_path --dest ./manifest/ --ext $EXT --valid-percent 0.0 mv ./manifest/train.tsv $tsv_path python libri_labels.py $tsv_path --output-dir ./manifest/ --output-name $TESTSET mv "./manifest/$2.wrd" "./manifest/$2.wrd.txt"

Run Viterbi

python examples/speech_recognition/infer.py $testset_path --task audio_pretraining \ --nbest 1 --path $MODEL_PATH --gen-subset $TESTSET --results-path ./manifest/ --w2l-decoder viterbi --word-score -1 \ --sil-weight 0 --criterion ctc --labels ltr --max-tokens 4000000 --post-process letter

facebookresearch / fairseq

Wav2vec 2.0 Pre-train model Inference error #2606