flashlight / wav2letter

Facebook AI Research's Automatic Speech Recognition Toolkit
https://github.com/facebookresearch/wav2letter/wiki
Other
6.39k stars 1.01k forks source link

Decode binary hangs on longer audios (SOTA model) #712

Closed abhinavkulkarni closed 4 years ago

abhinavkulkarni commented 4 years ago

Hi,

I ran the 2019 SOTA model on two segments of same audio file (one 22 secs long and another 5mins) using Decode binary. The model seems to be hanging on the longer model (I have waited for more than 30mins). Is this a known behavior?

The setup was exactly the same in both cases, the only thing that changed was the text file that has paths for the audio files, duration and (ground truth) transcriptions.

Here's the command (I downloaded individual pieces of the model in sota directory, so it's structure is different from that in the recipe):

:/root/wav2letter/build/Decoder \
--am=sota/am_transformer_ctc_librivox_dev_other.bin \
--tokensdir=sota \
--tokens=librispeech-train-all-unigram-10000.tokens \
--lexicon=sota/decoder-unigram-10000-nbest10.lexicon \
--lm=sota/lm_librispeech_kenlm_word_4g_200kvocab.bin \
--datadir=sota \
--test=test-other.lst \
--uselexicon=true \
--sclite=log \
--decodertype=wrd \
--lmtype=kenlm \
--silscore=0 \
--beamsize=500 \
--beamsizetoken=100 \
--beamthreshold=100 \
--nthread_decoder=8 \
--smearing=max \
--show \
--lmweight=0.61603454256618 \
--wordscore=0.96560269382887

Here's what I have in the sota directory:

:tree sota/
sota/
|-- am_transformer_ctc_librivox_dev_other.bin
|-- am_transformer_s2s_librivox_dev_other.bin
|-- decoder-unigram-10000-nbest10.lexicon
|-- librispeech-train+dev-unigram-10000-nbest10.lexicon
|-- librispeech-train-all-unigram-10000.tokens
|-- lm_librispeech_kenlm_word_4g_200kvocab.bin
|-- lm_librispeech_kenlm_wp_10k_6gram_pruning_000012.bin
`-- test-other.lst

0 directories, 8 files

Here's what I have in the sota/test-other.lst file:

:0 /root/host/audio/JRE_Elon_Musk_#1470.wav 300000  well come back here go when you think about when your child is born you will know for the rest of this child's life you were born during a weary time that was that is for sure the probably the weird that i can remember ah yes yes and he was born on may fourth and yet that's where too he that work be with him the has to be i hope i shall have some perfect yet i mean that is the perfect day for you i'm and what how do you say the name ah so is it does it feel strange to have a child while this crazy is going to feel you've had children before it is any queer ah it's i think it's better being older in having a kid i appreciate it more get papers horse they are there and when i didn't want to have any of my own i would see other people's kids and i didn't not like that sir but i wasn't drawn to them but now when i see little people's kids and went i think of this old love packages said love just you you think of them different differently when you see them come out and then grow and then start talking to you like your whole idea what a baby is is very different you so now as you you don't get older and get appreciated it as a mature fully formed adult it must be really pretty wonderful yet it wonderful it's great for his horse they are dear that a a great arm yes i i also spent a lot of time on a i and the mat mats and see you instead of the cut of the brain which is you and i know that is trying to to simulate what a brain does base a and you can consider see the learning very quickly no you just well see things fight so you talk about the girl that you're not talking about it an actual baby and a baby but both of em yes but the word that comes from the brute the brain is like a net of your so you know it's like the you are the ye against the you as a great so when you are programme artificial intelligence were you working with artificial intelligence art are they specifically trying a mimic the development developmental process of a human brain in a lot ways there some ways that are different i you are an analogy that often used is like you a weed we don't make a submarine swim like a fish but we take the principles of of how you know what would have hydro and applied them to the submarine i was one as a lay person do try to achieve the same results as a human brain but through different methods or do you try to copy the way a human brain achieves results i the essential elements of an a r really very very somewhat to human brain you not here having them the multiple airs from your hands and a you're back propagation these all these things are what your brain is and it's yes i that you have a letter of your aunt that goes through the intermediate steps to ultimately cognition and that and then it will reverse those staffs to go back and forth and all over the place a it's some that it's interesting thing

Thanks!

abhinavkulkarni commented 4 years ago

Moreover, the binary fails on am_tds_s2s_librivox_dev_other (everything else being the same):

I0620 21:46:52.741475   865 Decode.cpp:127] Number of classes (network): 9998
I0620 21:46:54.322589   865 Decode.cpp:134] Number of words: 200001
I0620 21:46:54.796387   865 Decode.cpp:247] [Decoder] LM constructed.
I0620 21:46:57.361129   865 Decode.cpp:274] [Decoder] Trie planted.
I0620 21:46:57.789614   865 Decode.cpp:286] [Decoder] Trie smeared.
I0620 21:46:58.397217   865 W2lListFilesDataset.cpp:141] 1 files found. 
I0620 21:46:58.397238   865 Utils.cpp:102] Filtered 0/1 samples
I0620 21:46:58.397258   865 W2lListFilesDataset.cpp:62] Total batches (i.e. iters): 1
F0620 21:46:58.397612   877 Decode.cpp:422] FLAGS_nthread_decoder exceeds the number of visible GPUs
*** Check failure stack trace: ***
F0620 21:46:58.397619   878 Decode.cpp:422] FLAGS_nthread_decoder exceeds the number of visible GPUsF0F20 21:46:58.397660 6 879 20c0 21c] FL658.3970000:46:58.3Decod .cpp:42206208]   ecode.cpp:422S_nthread_decoder exceeds the number of visible GPUs
21:46:58.397663   882 Decode.cpp:422] FLAGS_nthread_decoder exceeds the number of visible GPUs
*** Check failure stack trace: ***
F0620 21:46:58.397619   878 Decode.cpp:422] FLAGS_nthread_decoder exceeds the number of visible GPUsF0F20 21:46:58.397660 6 879 20c0 21c] FL658.3970000:46:58.3Decod .cpp:42206208]   ecode.cpp:422S_nthread_decoder exceeds the number of visible GPUs
21:46:58.397663   882 Decode.cpp:422] FLAGS_nthread_decoder exceeds the number of visible GPUs
*** Check failure stack trace: ***
F0620 21:46:58.397619   878 Decode.cpp:422] FLAGS_nthread_decoder exceeds the number of visible GPUsF0F20 21:46:58.397660 6 879 20c0 21c] FL658.3970000:46:58.3Decod .cpp:42206208]   ecode.cpp:422S_nthread_decoder exceeds the number of visible GPUs
21:46:58.397663   882 Decode.cpp:422] FLAGS_nthread_decoder exceeds the number of visible GPUs
*** Check failure stack trace: ***
F0620 21:46:58.397619   878 Decode.cpp:422] FLAGS_nthread_decoder exceeds the number of visible GPUsF0F20 21:46:58.397660 6 879 20c0 21c] FL658.3970000:46:58.3Decod .cpp:42206208]   ecode.cpp:422S_nthread_decoder exceeds the number of visible GPUs
21:46:58.397663   882 Decode.cpp:422] FLAGS_nthread_decoder exceeds the number of visible GPUs
*** Check failure stack trace: ***
F0620 21:46:58.397619   878 Decode.cpp:422] FLAGS_nthread_decoder exceeds the number of visible GPUsF0F20 21:46:58.397660 6 879 20c0 21c] FL658.3970000:46:58.3Decod .cpp:42206208]   ecode.cpp:422S_nthread_decoder exceeds the number of visible GPUs
21:46:58.397663   882 Decode.cpp:422] FLAGS_nthread_decoder exceeds the number of visible GPUs
*** Check failure stack trace: ***
Aborted (core dumped)
tlikhomanenko commented 4 years ago

@abhinavkulkarni

For the am_tds_s2s_librivox_dev_other as soon as it is a seq2seq model the AM forward happens at each decoder step, thus it happens on GPU. You need to set nthread_decoder to the number of GPUs you have.

About decoding am_transformer_ctc_librivox_dev_other:

Let me know results from above.

tlikhomanenko commented 4 years ago

@abhinavkulkarni did you solve the issue of decoder hanging?