Facebook AI Research's Automatic Speech Recognition Toolkit
Segmentation fault (core dumped), decode, wsj data #333

Bang-sheng-Zhuo commented 5 years ago

run command

./wav2letter/build/Decoder --flagsfile wav2letter/recipes/wsj/configs/conv_glu/decode.cfg

Gflags after parsing

--am=/root/dataset/ckpt/wsj_gluam/001_model_data#nov93dev.bin; --arch=network.arch; --archdir=/root/wav2letter/recipes/wsj/configs/conv_glu/; --batchsize=4; --beamsize=5000; --beamthreshold=25; --channels=1; --criterion=asg; --datadir=/root/dataset/wsj/; --decodertype=wrd; --flagsfile=wav2letter/recipes/wsj/configs/conv_glu/decode.cfg; --lexicon=; --lm=/root/dataset/wsj/lm/lm-4g.bin; --lm_memory=5000; --lmtype=kenlm; --lmweight=5.5; --test=data/nov92; --tokens=tokens.txt; --tokensdir=/root/dataset/wsj/data/; --train=data/si284; --valid=data/nov93dev; --wordscore=2.1000000000000001;

stack trace

I0617 04:09:06.106374 12224 Decode.cpp:117] Number of classes (network): 30
I0617 04:09:06.106578 12224 NumberedFilesLoader.cpp:29] Adding dataset /root/dataset/wsj/data/nov92 ...
I0617 04:09:06.106910 12224 NumberedFilesLoader.cpp:68] 333 files found.
I0617 04:09:06.183238 12224 Utils.cpp:102] Filtered 0/333 samples
I0617 04:09:06.183429 12224 W2lNumberedFilesDataset.cpp:57] Total batches (i.e. iters): 333
I0617 04:09:06.183524 12224 Decode.cpp:138] [Serialization] Running forward pass ...
I0617 04:09:21.771307 12224 Decode.cpp:185] [Dataset] Number of samples per thread: 42
I0617 04:09:21.942915 12224 Decode.cpp:268] [Decoder] LM constructed.
I0617 04:09:21.943946 12237 Decode.cpp:335] [Decoder] Decoder with word-LM loaded in thread: 0
*** Aborted at 1560744561 (unix time) try "date -d @1560744561" if you are using GNU date ***
I0617 04:09:21.945180 12233 Decode.cpp:335] [Decoder] Decoder with word-LM loaded in thread: 1
PC: @           0x55d9d0 (unknown)
I0617 04:09:21.945837 12234 Decode.cpp:335] [Decoder] Decoder with word-LM loaded in thread: 2
I0617 04:09:21.945968 12239 Decode.cpp:335] [Decoder] Decoder with word-LM loaded in thread: 6
I0617 04:09:21.945996 12232 Decode.cpp:335] [Decoder] Decoder with word-LM loaded in thread: 4
I0617 04:09:21.946020 12235 Decode.cpp:335] [Decoder] Decoder with word-LM loaded in thread: 5
I0617 04:09:21.946043 12236 Decode.cpp:335] [Decoder] Decoder with word-LM loaded in thread: 3
I0617 04:09:21.946066 12238 Decode.cpp:335] [Decoder] Decoder with word-LM loaded in thread: 7
*** SIGSEGV (@0x0) received by PID 12224 (TID 0x7f3688ac3700) from PID 0; stack trace: ***
    @     0x7f3778d49390 (unknown)
    @           0x55d9d0 (unknown)
    @           0x553728 w2l::LexiconDecoder::decodeBegin()
    @           0x4775d4 _ZZ4mainENKUliiiE2_clEiii
    @           0x47aafa _ZNSt17_Function_handlerIFSt10unique_ptrINSt13__future_base12_Result_baseENS2_8_DeleterEEvENS1_12_Task_setterIS0_INS1_7_ResultIvEES3_ESt12_Bind_simpleIFSt17reference_wrapperISt5_BindIFZ4mainEUliiiE2_iiiEEEvEEvEEE9_M_invokeERKSt9_Any_data
    @           0x47ff59 std::__future_base::_State_baseV2::_M_do_set()
    @     0x7f3778d46a99 __pthread_once_slow
    @           0x4763c1 _ZNSt13__future_base11_Task_stateISt5_BindIFZ4mainEUliiiE2_iiiEESaIiEFvvEE6_M_runEv
    @           0x48589b _ZNSt6thread5_ImplISt12_Bind_simpleIFZN2fl10ThreadPoolC4EmRKSt8functionIFvmEEEUlvE_vEEE6_M_runEv
    @     0x7f373031fc80 (unknown)
    @     0x7f3778d3f6ba start_thread
    @     0x7f372fa8541d clone
    @                0x0 (unknown)
Segmentation fault (core dumped)

By the way, I trained with GLU and WSJ data on another machine, decoding runs fine but I got -- WER: nan, LER: 100. The TER of training set is train-TER: 9.87 | data/nov93dev-TER: 11.95. I build the LM binary with kenlm.

xuqiantong commented 5 years ago

Seems like you forget to provide a lexicon. Please refer to https://github.com/facebookresearch/wav2letter/blob/master/docs/decoder.md for detail.

Bang-sheng-Zhuo commented 5 years ago

Seems like you forget to provide a lexicon. Please refer to https://github.com/facebookresearch/wav2letter/blob/master/docs/decoder.md for detail.

Thanks. - -! The latest docker images of w2l provides flag with --words, not --lexicon, in recipes/wsj/configs/conv_glu/decode.cfg. I thought it would work if I follow the configs.

tlikhomanenko commented 5 years ago

Hi @k1kyo,

the docker image was updated two weeks ago where flag --lexicon is already provided. Please pull the image on your local machine.

Shortly will update the images with the current state if you need them too.

Bang-sheng-Zhuo commented 5 years ago

Hi @k1kyo,

the docker image was updated two weeks ago where flag --lexicon is already provided. Please pull the image on your local machine.

Shortly will update the images with the current state if you need them too.

I just pull the newest image of cuda-latest which was created two weeks ago on my local machine, and I found the decode.cfg of WSJ recipes still privides --words flag, not --lexicon flag.

xuqiantong commented 5 years ago

Everything should be fine on master now. I think we were at transition two weeks ago. Sorry for the confusion.

tlikhomanenko commented 5 years ago

Hi @k1kyo,

I have updated images with the current master, you can pull again and everything should work now. Please ping if any error still remains.

Bang-sheng-Zhuo commented 5 years ago

@tlikhomanenko thanks