flashlight / wav2letter

Facebook AI Research's Automatic Speech Recognition Toolkit
https://github.com/facebookresearch/wav2letter/wiki
Other
6.37k stars 1.01k forks source link

MLS Docker inference examples #940

Open loretoparisi opened 3 years ago

loretoparisi commented 3 years ago

Question

To provide examples of inference with MLS pretrained tokens&lexicon and acoustic and language models showing

Additional Context

Currently a example command to run a wav2letter inference with the latest docker image is the following

sudo docker run --rm -v ~:/root/host/ -it --ipc=host --name w2l -a stdin -a stdout -a stderr wav2letter/wav2letter:inference-latest sh -c "cat /root/host/audio/LibriSpeech/dev-clean/777/126732/777-126732-0070.flac.wav | /root/wav2letter/build/inference/inference/examples/simple_streaming_asr_example --input_files_base_path /root/host/model"

I have recently built a simpler docker image to run wav2vec inference here It would be cool to have a simple pipeline for MLS/wav2letter as well!

tlikhomanenko commented 3 years ago

cc @vineelpratap @xuqiantong

vineelpratap commented 3 years ago

Hi, To run inference please follow the commands here -https://github.com/facebookresearch/wav2letter/tree/master/recipes/mls#decoding using the latest docker from flashlight repo. We don't provide pre trained models only for offline ASR and not for streaming ASR (https://github.com/facebookresearch/wav2letter/issues/920) and hence simple_streaming_asr_example cannot be used..

loretoparisi commented 3 years ago

@vineelpratap thanks! So as I enter the docker container then I run commands for decoding, but I see two different syntax here. For beam search, we have;

/flashlight/build/bin/asr/fl_asr_decode --flagsfile=decode/[lang].cfg

While for viterbi decoding we have

fl_asr_test --am=[...]/am.bin --lexicon=[...]/train_lexicon.txt --datadir=[...] --test=test.lst --tokens=[...]/tokens.txt --emission_dir='' --nouselexicon --show

Why?

vineelpratap commented 3 years ago

That's true.

fl_asr_test is for viterbi decoding while fl_asr_decode is for beam search decoding with a Language Model. If you just care about getting the best WER, please use the latter.

schipoco commented 3 years ago

Hello,

I don't know if my following question is the kind of question that is proper to ask in Github. Still, since I have been fighting the last days with this I decided myself to ask. I am trying to learn how to train a speech recognition system in spanish using Python and I found about wav2letter in the following link https://ai.facebook.com/blog/a-new-open-data-set-for-multilingual-speech-research/, whiche led me here https://github.com/facebookresearch/wav2letter/tree/master/recipes/mls . I downloaded the proper files and I tried to follow the USAGE STEPS in wav2letter/recipes/mls/README.md

vineelpratap commented 3 years ago

Hi, yes, once you build flashlight (https://github.com/facebookresearch/flashlight#building-and-installing), it'll build the binaries for decoding. You can then use the commands mentioned in the MLS recipe to run decoding...

loretoparisi commented 3 years ago

@vineelpratap is it possible to build using the provided Dockerfile here and then using the MLS recipe to run the decoder in the same way?

vineelpratap commented 3 years ago

Yes, that is also possible~