MLS Docker inference examples

loretoparisi commented 3 years ago

Question

To provide examples of inference with MLS pretrained tokens&lexicon and acoustic and language models showing

Which docker image shall be used
How to pass the models and lexicon to the run command
how to run in both CPU and GPU

Additional Context

Currently a example command to run a wav2letter inference with the latest docker image is the following

sudo docker run --rm -v ~:/root/host/ -it --ipc=host --name w2l -a stdin -a stdout -a stderr wav2letter/wav2letter:inference-latest sh -c "cat /root/host/audio/LibriSpeech/dev-clean/777/126732/777-126732-0070.flac.wav | /root/wav2letter/build/inference/inference/examples/simple_streaming_asr_example --input_files_base_path /root/host/model"

I have recently built a simpler docker image to run wav2vec inference here It would be cool to have a simple pipeline for MLS/wav2letter as well!

tlikhomanenko commented 3 years ago

cc @vineelpratap @xuqiantong

vineelpratap commented 3 years ago

Hi, To run inference please follow the commands here -https://github.com/facebookresearch/wav2letter/tree/master/recipes/mls#decoding using the latest docker from flashlight repo. We don't provide pre trained models only for offline ASR and not for streaming ASR (https://github.com/facebookresearch/wav2letter/issues/920) and hence simple_streaming_asr_example cannot be used..

loretoparisi commented 3 years ago

@vineelpratap thanks! So as I enter the docker container then I run commands for decoding, but I see two different syntax here. For beam search, we have;

/flashlight/build/bin/asr/fl_asr_decode --flagsfile=decode/[lang].cfg

While for viterbi decoding we have

fl_asr_test --am=[...]/am.bin --lexicon=[...]/train_lexicon.txt --datadir=[...] --test=test.lst --tokens=[...]/tokens.txt --emission_dir='' --nouselexicon --show

Why?

vineelpratap commented 3 years ago

That's true.

fl_asr_test is for viterbi decoding while fl_asr_decode is for beam search decoding with a Language Model. If you just care about getting the best WER, please use the latter.

schipoco commented 3 years ago

Hello,

I don't know if my following question is the kind of question that is proper to ask in Github. Still, since I have been fighting the last days with this I decided myself to ask. I am trying to learn how to train a speech recognition system in spanish using Python and I found about wav2letter in the following link https://ai.facebook.com/blog/a-new-open-data-set-for-multilingual-speech-research/, whiche led me here https://github.com/facebookresearch/wav2letter/tree/master/recipes/mls . I downloaded the proper files and I tried to follow the USAGE STEPS in wav2letter/recipes/mls/README.md

I had no problem doing the preparation of dataset.
I could not complete the training or decoding steps, even looking at the flashlight project. When in README.md you say that there are dependencies with flashlight, I don't know what does that mean. I tried to download the flashlight project, next to the wav2letter project but I think that is not the solution. I noticed that the flashlight project is written in c++. Does that mean that I need c++ to run this MLS part of the wav2project locally?

vineelpratap commented 3 years ago

Hi, yes, once you build flashlight (https://github.com/facebookresearch/flashlight#building-and-installing), it'll build the binaries for decoding. You can then use the commands mentioned in the MLS recipe to run decoding...

loretoparisi commented 3 years ago

@vineelpratap is it possible to build using the provided Dockerfile here and then using the MLS recipe to run the decoder in the same way?

vineelpratap commented 3 years ago

Yes, that is also possible~

flashlight / wav2letter

MLS Docker inference examples #940

Question

Additional Context