Error 'ArrayFire Exception (Device out of memory:101) when TDS CTC is decoded

flashlight / wav2letter

Facebook AI Research's Automatic Speech Recognition Toolkit

https://github.com/facebookresearch/wav2letter/wiki

Other

6.39k stars 1.01k forks source link

Error 'ArrayFire Exception (Device out of memory:101) when TDS CTC is decoded #711

Closed ML6634 closed 3 years ago

ML6634 commented 4 years ago

I used --iter=100000 instead of 10000000 to train the TDS CTC model. At decoding, I got

...
Unable to allocate memory with native alloc for size 41943040 bytes with error 'ArrayFire Exception (Device out of memory:101):
ArrayFire error: 
In function fl::MemoryManagerInstaller::MemoryManagerInstaller(std::shared_ptr<fl::MemoryManagerAdapter>)::<lambda(size_t)>
In file /root/flashlight/flashlight/memory/MemoryManagerInstaller.cpp:178'terminate called after throwing an instance of 'af::exception'
  what():  ArrayFire Exception (Unknown error:999):

In function T* af::array::device() const [with T = void]
In file src/api/cpp/array.cpp:1024
*** Aborted at 1592624340 (unix time) try "date -d @1592624340" if you are using GNU date ***
PC: @     0x7f077ba1ae97 gsignal
*** SIGABRT (@0x5a) received by PID 90 (TID 0x7f07c12ea380) from PID 90; stack trace: ***
    @     0x7f07b9600890 (unknown)
    @     0x7f077ba1ae97 gsignal
    @     0x7f077ba1c801 abort
    @     0x7f077c40f957 (unknown)
    @     0x7f077c415ab6 (unknown)
    @     0x7f077c415af1 std::terminate()
    @     0x7f077c415d24 __cxa_throw
    @     0x7f079c66a728 af::array::device<>()
    @     0x55793ed1bc4c fl::DevicePtr::DevicePtr()
    @     0x55793ed9823d fl::conv2d()
    @     0x55793ed83c0f fl::AsymmetricConv1D::forward()
    @     0x55793ed57cfe fl::UnaryModule::forward()
    @     0x55793ed699b5 fl::WeightNorm::forward()
    @     0x55793ed8b2a1 fl::Residual::forward()
    @     0x55793ed8b3fd fl::Residual::forward()
    @     0x55793ed4369a fl::Sequential::forward()
    @     0x55793ec62a4a _ZZN3w2l27buildGetConvLmScoreFunctionESt10shared_ptrIN2fl6ModuleEEENKUlRKSt6vectorIiSaIiEES8_iiE_clES8_S8_ii
    @     0x55793ec63866 _ZNSt17_Function_handlerIFSt6vectorIS0_IfSaIfEESaIS2_EERKS0_IiSaIiEES8_iiEZN3w2l27buildGetConvLmScoreFunctionESt10shared_ptrIN2fl6ModuleEEEUlS8_S8_iiE_E9_M_invokeERKSt9_Any_dataS8_S8_OiSK_
    @     0x55793ebc6ba0 w2l::ConvLM::scoreWithLmIdx()
    @     0x55793ebc7264 w2l::ConvLM::score()
    @     0x55793ea7a9f6 main
    @     0x7f077b9fdb97 __libc_start_main
    @     0x55793ead6d2a _start
Aborted (core dumped)

Any ideas to take care of this? Thank you!

tlikhomanenko commented 4 years ago

@ML6634

could you give more details what are you running? Do you run Decode or Train binaries?

ML6634 commented 4 years ago

Thank @tlikhomanenko for asking. I got it when I ran Decode (see the bottom). If you need more information, please feel free to let me know. Thank you!

root@94fa6edf8e14:~# wav2letter/build/Decoder --flagsfile wav2letter/recipes/models/sota/2019/librispeech/decode_tds_ctc_gcnn_clean.cfg --minloglevel=0 --logtostderr=1
I0620 03:38:50.810289    90 Decode.cpp:58] Reading flags from file wav2letter/recipes/models/sota/2019/librispeech/decode_tds_ctc_gcnn_clean.cfg
I0620 03:38:50.810439    90 Decode.cpp:75] [Network] Reading acoustic model from /root/w2l/saved_models/pre-trained_acoustic_models/am_tds_ctc_librispeech_dev_clean.bin
I0620 03:38:51.475615    90 Decode.cpp:79] [Network] Sequential [input -> (0) -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> (12) -> (13) -> (14) -> (15) -> (16) -> (17) -> (18) -> (19) -> (20) -> (21) -> (22) -> (23) -> (24) -> (25) -> (26) -> (27) -> (28) -> (29) -> (30) -> (31) -> (32) -> (33) -> (34) -> (35) -> (36) -> (37) -> output]
    (0): SpecAugment ( W: 80, F: 27, mF: 2, T: 100, p: 1, mT: 2 )
    (1): View (-1 80 1 0)
    (2): Conv2D (1->10, 21x1, 2,1, SAME,SAME, 1, 1) (with bias)
    (3): ReLU
    (4): Dropout (0.000000)
    (5): LayerNorm ( axis : { 0 1 2 } , size : -1)
    (6): Time-Depth Separable Block (21, 240, 10) [800 -> 2400 -> 800]
    (7): Time-Depth Separable Block (21, 240, 10) [800 -> 2400 -> 800]
    (8): Time-Depth Separable Block (21, 240, 10) [800 -> 2400 -> 800]
    (9): Time-Depth Separable Block (21, 240, 10) [800 -> 2400 -> 800]
    (10): Time-Depth Separable Block (21, 240, 10) [800 -> 2400 -> 800]
    (11): Conv2D (10->14, 21x1, 2,1, SAME,SAME, 1, 1) (with bias)
    (12): ReLU
    (13): Dropout (0.000000)
    (14): LayerNorm ( axis : { 0 1 2 } , size : -1)
    (15): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120]
    (16): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120]
    (17): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120]
    (18): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120]
    (19): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120]
    (20): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120]
    (21): Conv2D (14->18, 21x1, 2,1, SAME,SAME, 1, 1) (with bias)
    (22): ReLU
    (23): Dropout (0.000000)
    (24): LayerNorm ( axis : { 0 1 2 } , size : -1)
    (25): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440]
    (26): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440]
    (27): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440]
    (28): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440]
    (29): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440]
    (30): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440]
    (31): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440]
    (32): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440]
    (33): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440]
    (34): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440]
    (35): View (0 1440 1 0)
    (36): Reorder (1,0,3,2)
    (37): Linear (1440->9998) (with bias)
I0620 03:38:51.475695    90 Decode.cpp:82] [Criterion] ConnectionistTemporalClassificationCriterion
I0620 03:38:51.475713    90 Decode.cpp:84] [Network] Number of params: 203394122
I0620 03:38:51.475725    90 Decode.cpp:90] [Network] Updating flags from config file: /root/w2l/saved_models/pre-trained_acoustic_models/am_tds_ctc_librispeech_dev_clean.bin
I0620 03:38:51.476186    90 Decode.cpp:106] Gflags after parsing 
--flagfile=; --fromenv=; --tryfromenv=; --undefok=; --tab_completion_columns=80; --tab_completion_word=; --help=false; --helpfull=false; --helpmatch=; --helpon=; --helppackage=false; --helpshort=false; --helpxml=false; --version=false; --adambeta1=0.90000000000000002; --adambeta2=0.999; --am=/root/w2l/saved_models/pre-trained_acoustic_models/am_tds_ctc_librispeech_dev_clean.bin; --am_decoder_tr_dropout=0; --am_decoder_tr_layerdrop=0; --am_decoder_tr_layers=1; --arch=tds_do0.15_l5.6.10_mid3.0_ctc.arch2; --archdir=/private/home/qiantong/push_numbers/200M/do0.15_l5.6.10_mid3.0_incDO; --attention=content; --attentionthreshold=0; --attnWindow=no; --attnconvchannel=0; --attnconvkernel=0; --attndim=0; --batchsize=4; --beamsize=250; --beamsizetoken=100; --beamthreshold=20; --blobdata=false; --channels=1; --criterion=ctc; --critoptim=sgd; --datadir=/root/w2l/lists; --dataorder=input; --decoderattnround=1; --decoderdropout=0; --decoderrnnlayer=1; --decodertype=wrd; --devwin=0; --emission_dir=; --emission_queue_size=3000; --enable_distributed=true; --encoderdim=0; --eosscore=0; --eostoken=false; --everstoredb=false; --fftcachesize=1; --filterbanks=80; --flagsfile=wav2letter/recipes/models/sota/2019/librispeech/decode_tds_ctc_gcnn_clean.cfg; --framesizems=30; --framestridems=10; --gamma=0.5; --gumbeltemperature=1; --input=flac; --inputbinsize=100; --inputfeeding=false; --isbeamdump=false; --iter=1500; --itersave=false; --labelsmooth=0; --leftWindowSize=50; --lexicon=/root/w2l/decoder/decoder-unigram-10000-nbest10.lexicon; --linlr=-1; --linlrcrit=-1; --linseg=0; --lm=/root/w2l/decoder/lm_librispeech_convlm_word_14B.bin; --lm_memory=5000; --lm_vocab=/root/w2l/decoder/lm_librispeech_convlm_word_14B.vocab; --lmtype=convlm; --lmweight=0.77112819889331996; --localnrmlleftctx=0; --localnrmlrightctx=0; --logadd=false; --lr=0.29999999999999999; --lr_decay=9223372036854775807; --lr_decay_step=9223372036854775807; --lrcosine=false; --lrcrit=0; --max_devices_per_node=8; --maxdecoderoutputlen=200; --maxgradnorm=1; --maxisz=9223372036854775807; --maxload=-1; --maxrate=10; --maxsil=50; --maxtsz=9223372036854775807; --maxword=-1; --melfloor=1; --memstepsize=8338608; --mfcc=false; --mfcccoeffs=13; --mfsc=true; --minisz=0; --minrate=3; --minsil=0; --mintsz=0; --momentum=0.5; --netoptim=sgd; --noresample=false; --nthread=10; --nthread_decoder=8; --nthread_decoder_am_forward=1; --numattnhead=8; --onorm=target; --optimepsilon=1e-08; --optimrho=0.90000000000000002; --outputbinsize=5; --pctteacherforcing=100; --pcttraineval=100; --pow=false; --pretrainWindow=0; --replabel=0; --reportiters=0; --rightWindowSize=50; --rndv_filepath=/checkpoint/qiantong/ls_200M/do0.15_l5.6.10_mid3.0_incDO/100_rndv; --rundir=/checkpoint/qiantong/ls_200M/do0.15_l5.6.10_mid3.0_incDO; --runname=100; --samplerate=16000; --sampletarget=0; --samplingstrategy=rand; --saug_fmaskf=27; --saug_fmaskn=2; --saug_start_update=-1; --saug_tmaskn=2; --saug_tmaskp=1; --saug_tmaskt=100; --sclite=/root/w2l/saved_models/am_tds_ctc_librispeech; --seed=2; --show=true; --showletters=true; --silscore=0; --smearing=max; --smoothingtemperature=1; --softwoffset=10; --softwrate=5; --softwstd=5; --sqnorm=true; --stepsize=200; --surround=; --tag=; --target=ltr; --test=test-clean.lst; --tokens=librispeech-train-all-unigram-10000.tokens; --tokensdir=/root/w2l/am; --train=/checkpoint/antares/datasets/librispeech/lists/train-clean-100.lst,/checkpoint/antares/datasets/librispeech/lists/train-clean-360.lst,/checkpoint/antares/datasets/librispeech/lists/train-other-500.lst; --trainWithWindow=false; --transdiag=0; --unkscore=-inf; --use_memcache=false; --uselexicon=true; --usewordpiece=true; --valid=librispeech/dev-other:/checkpoint/antares/datasets/librispeech/lists/dev-other.lst,librispeech/dev-clean:/checkpoint/antares/datasets/librispeech/lists/dev-clean.lst; --warmup=8000; --weightdecay=0; --wordscore=0.35770072381611001; --wordseparator=_; --world_rank=0; --world_size=64; --alsologtoemail=; --alsologtostderr=false; --colorlogtostderr=false; --drop_log_memory=true; --log_backtrace_at=; --log_dir=; --log_link=; --log_prefix=true; --logbuflevel=0; --logbufsecs=30; --logemaillevel=999; --logfile_mode=436; --logmailer=/bin/mail; --logtostderr=true; --max_log_size=1800; --minloglevel=0; --stderrthreshold=2; --stop_logging_if_full_disk=false; --symbolize_stacktrace=true; --v=0; --vmodule=; 
I0620 03:38:51.479128    90 Decode.cpp:127] Number of classes (network): 9998
I0620 03:38:52.504566    90 Decode.cpp:134] Number of words: 200001
I0620 03:38:52.588066    90 Decode.cpp:231] [ConvLM]: Loading LM from /root/w2l/decoder/lm_librispeech_convlm_word_14B.bin
[ConvLM]: Loading vocabulary from /root/w2l/decoder/lm_librispeech_convlm_word_14B.vocab
[ConvLM]: vocabulary size of convLM 221452
I0620 03:38:55.685602    90 Decode.cpp:247] [Decoder] LM constructed.
Unable to allocate memory with native alloc for size 41943040 bytes with error 'ArrayFire Exception (Device out of memory:101):
ArrayFire error: 
In function fl::MemoryManagerInstaller::MemoryManagerInstaller(std::shared_ptr<fl::MemoryManagerAdapter>)::<lambda(size_t)>
In file /root/flashlight/flashlight/memory/MemoryManagerInstaller.cpp:178'terminate called after throwing an instance of 'af::exception'
  what():  ArrayFire Exception (Unknown error:999):

In function T* af::array::device() const [with T = void]
In file src/api/cpp/array.cpp:1024
*** Aborted at 1592624340 (unix time) try "date -d @1592624340" if you are using GNU date ***
PC: @     0x7f077ba1ae97 gsignal
*** SIGABRT (@0x5a) received by PID 90 (TID 0x7f07c12ea380) from PID 90; stack trace: ***
    @     0x7f07b9600890 (unknown)
    @     0x7f077ba1ae97 gsignal
    @     0x7f077ba1c801 abort
    @     0x7f077c40f957 (unknown)
    @     0x7f077c415ab6 (unknown)
    @     0x7f077c415af1 std::terminate()
    @     0x7f077c415d24 __cxa_throw
    @     0x7f079c66a728 af::array::device<>()
    @     0x55793ed1bc4c fl::DevicePtr::DevicePtr()
    @     0x55793ed9823d fl::conv2d()
    @     0x55793ed83c0f fl::AsymmetricConv1D::forward()
    @     0x55793ed57cfe fl::UnaryModule::forward()
    @     0x55793ed699b5 fl::WeightNorm::forward()
    @     0x55793ed8b2a1 fl::Residual::forward()
    @     0x55793ed8b3fd fl::Residual::forward()
    @     0x55793ed4369a fl::Sequential::forward()
    @     0x55793ec62a4a _ZZN3w2l27buildGetConvLmScoreFunctionESt10shared_ptrIN2fl6ModuleEEENKUlRKSt6vectorIiSaIiEES8_iiE_clES8_S8_ii
    @     0x55793ec63866 _ZNSt17_Function_handlerIFSt6vectorIS0_IfSaIfEESaIS2_EERKS0_IiSaIiEES8_iiEZN3w2l27buildGetConvLmScoreFunctionESt10shared_ptrIN2fl6ModuleEEEUlS8_S8_iiE_E9_M_invokeERKSt9_Any_dataS8_S8_OiSK_
    @     0x55793ebc6ba0 w2l::ConvLM::scoreWithLmIdx()
    @     0x55793ebc7264 w2l::ConvLM::score()
    @     0x55793ea7a9f6 main
    @     0x7f077b9fdb97 __libc_start_main
    @     0x55793ead6d2a _start
Aborted (core dumped)

tlikhomanenko commented 4 years ago

What is GPU memory on your machine? Try to set lm_memory=500

ML6634 commented 4 years ago

The GPU memory on my machine is 4 GB:

~$ glxinfo | egrep -i 'device|memory'
    GLX_NV_multigpu_context, GLX_NV_robustness_video_memory_purge, 
    GLX_NV_robustness_video_memory_purge, GLX_NV_swap_group, 
    GLX_NV_multigpu_context, GLX_NV_robustness_video_memory_purge, 
Memory info (GL_NVX_gpu_memory_info):
    Dedicated video memory: 4096 MB
    Total available memory: 4096 MB
    ...

Also see: https://github.com/facebookresearch/wav2letter/issues/683#issue-633790334.

I tried setting lm_memory=500, ... and even down to 1, it produced the same error when I ran it. Any other ways? Thank you!

tlikhomanenko commented 4 years ago

Well, the problem that both am and lm are loaded on the GPU, we do removing of AM from GPU as soon as predictions are ready for CTC models, but loading of LM happens before loading AM (for example to make sure your program will not crash after AM forward during LM loading and additional decoder things).

Solution for you to try with GPU memory restriction you have:

Run Test before and save emissions into emission_dir
Run decoding with providing emission_dir - in this case AM forward will not be running and predictions will be read from the saved on disk files in the emission_dir

Docs on Test and on emission dir check on decoder wiki page https://github.com/facebookresearch/wav2letter/wiki/Beam-Search-Decoder.

ML6634 commented 4 years ago

I ran

root@94fa6edf8e14:~# wav2letter/build/Test \
>   --am /root/w2l/saved_models/pre-trained_acoustic_models/am_tds_ctc_librispeech_dev_clean.bin\
>   --maxload 10 \
>   --emission_dir /root/w2l/saved_models/pre-trained_acoustic_models \
>   --test test-clean.lst

and

root@94fa6edf8e14:~# wav2letter/build/Test \
>   --am /root/w2l/saved_models/pre-trained_acoustic_models/am_tds_ctc_librispeech_dev_clean.bin\
>   --maxload 10 \
>   --test test-clean.lst

Both gave an error like:

terminate called after throwing an instance of 'std::runtime_error'
  what():  Invalid dictionary filepath specified.
*** Aborted at 1592894594 (unix time) try "date -d @1592894594" if you are using GNU date ***

Any ideas? Thank you!

tlikhomanenko commented 4 years ago

Did you read docs https://github.com/facebookresearch/wav2letter/wiki/Beam-Search-Decoder?

While running the Test binary, the AM is loaded and all the saved flags will be used if you don’t specify them in the command line. For example, tokens and lexicon paths. So, in case you want to overwrite them, you should directly specify them:

wav2letter/build/Test \
  --am path/to/train/am.bin \
  --maxload 10 \
  --test path/to/test/list/file \
  --tokensdir path/to/tokens/dir \
  --tokens tokens.txt \
  --lexicon path/to/the/lexicon/file

ML6634 commented 4 years ago

When I ran Decoder, I got an error:

root@02ae45d9e4ed:~# wav2letter/build/Decoder \
>   --flagsfile decode.cfg \
>   --lmweight 1 \
>   --wordscore 0 \
>   --eosscore 0 \
>   --silscore 0 \
>   --unkscore 0 \
>   --smearing max
E0627 04:58:07.188414   271 Serial.h:77] Error while loading "/root/wav2letter/src/decoder/test/emission.bin": basic_string::_M_replace_aux
E0627 04:58:08.188877   271 Serial.h:77] Error while loading "/root/wav2letter/src/decoder/test/emission.bin": basic_string::_M_replace_aux
E0627 04:58:10.189286   271 Serial.h:77] Error while loading "/root/wav2letter/src/decoder/test/emission.bin": basic_string::_M_replace_aux
E0627 04:58:14.189731   271 Serial.h:77] Error while loading "/root/wav2letter/src/decoder/test/emission.bin": basic_string::_M_replace_aux
E0627 04:58:22.190088   271 Serial.h:77] Error while loading "/root/wav2letter/src/decoder/test/emission.bin": basic_string::_M_replace_aux
E0627 04:58:38.190415   271 Serial.h:77] Error while loading "/root/wav2letter/src/decoder/test/emission.bin": basic_string::_M_replace_aux
terminate called after throwing an instance of 'std::length_error'
  what():  basic_string::_M_replace_aux
*** Aborted at 1593233918 (unix time) try "date -d @1593233918" if you are using GNU date ***
PC: @     0x7fc4b61d3e97 gsignal
*** SIGABRT (@0x10f) received by PID 271 (TID 0x7fc4fbaa3380) from PID 271; stack trace: ***
    @     0x7fc4f3db9890 (unknown)
    @     0x7fc4b61d3e97 gsignal
    @     0x7fc4b61d5801 abort
    @     0x7fc4b6bc8957 (unknown)
    @     0x7fc4b6bceab6 (unknown)
    @     0x7fc4b6bceaf1 std::terminate()
    @     0x7fc4b6bced79 __cxa_rethrow
    @     0x55d976031427 _ZN3w2l16retryWithBackoffIRFvRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERSt13unordered_mapIS6_S6_St4hashIS6_ESt8equal_toIS6_ESaISt4pairIS7_S6_EEERSt10shared_ptrIN2fl6ModuleEERSJ_INS_17SequenceCriterionEEEJS8_SI_SN_SQ_EEENSt9result_ofIFT_DpT0_EE4typeENSt6chrono8durationIdSt5ratioILl1ELl1EEEEdlOSU_DpOSV_.constprop.7511
    @     0x55d975fcf717 main
    @     0x7fc4b61b6b97 __libc_start_main
    @     0x55d97602dd2a _start
Aborted (core dumped)

my decode.cfg is:

--am=/root/wav2letter/src/decoder/test/emission.bin
--test=/root/w2l/lists/test-clean.lst
--maxload=10
--nthread_decoder=2
--show
--sholetters
--lexicon=/root/w2l/decoder/decoder-unigram-10000-nbest10.lexicon         
--uselexicon=true
--lm=/root/w2l/decoder/lm_librispeech_convlm_word_14B.bin
--lmtype=kenlm
--decodertype=wrd
--beamsize=100
--beamsizetoken=100
--beamthreshold=20

In addition,

root@02ae45d9e4ed:~/wav2letter/src/decoder/test# ls -l
total 4344
-rw-r--r-- 1 root root    6715 May  3 03:20 DecoderTest.cpp
-rw-r--r-- 1 root root       8 May  3 03:20 TN.bin
-rw-r--r-- 1 root root   27260 May  3 03:20 emission.bin
-rw-r--r-- 1 root root      56 May  3 03:20 letters.lst
-rw-r--r-- 1 root root 3727822 May  3 03:20 lm.arpa
-rw-r--r-- 1 root root    3364 May  3 03:20 transition.bin
-rw-r--r-- 1 root root  663876 May  3 03:20 words.lst

May I ask how to take care of the above bug? Thank you!,

tlikhomanenko commented 4 years ago

First ~/wav2letter/src/decoder/test contains data for decoder test, you cannot use them in the way you are doing. Emissions.bin stores emissions data, not an acoustic model.

What you need to do: 1) run Test binary for 10 samples with am you downloaded, like:

wav2letter/build/Test \
  --am path/to/am.bin \
  --maxload 10 \
  --test path/to/test/list/file \
  --tokensdir path/to/tokens/dir \
  --tokens tokens.txt \
  --lexicon path/to/the/lexicon/file
  --emission_dir path/where/to/store/emissions

2) Run decoder with specifying additional parameter in the config --emission_dir=path/where/to/store/emissions using above path.

For the 1) you can use values of parameters from the decoder config you used above wav2letter/recipes/models/sota/2019/librispeech/decode_tds_ctc_gcnn_clean.cfg

Are the steps clearer now?

ML6634 commented 4 years ago

Thank @tlikhomanenko for the help!

The Test:

root@a9a8aeab9c4c:~# wav2letter/build/Test \
>   --am /root/w2l/saved_models/pre-trained_acoustic_models/am_tds_ctc_librispeech_dev_clean.bin \
>   --maxload 10 \
>   --test /root/w2l/lists/test-clean.lst \
>   --tokensdir /root/w2l/am \
>   --tokens librispeech-train-all-unigram-10000.tokens \
>   --lexicon /root/w2l/am/librispeech-train+dev-unigram-10000-nbest10.lexicon \
>   --emission_dir /root/w2l/saved_models/pre-trained_acoustic_models

went through.

However,

root@a9a8aeab9c4c:~# wav2letter/build/Decoder \
>   --flagsfile test_decode.cfg \
>   --lmweight 1 \
>   --wordscore 0 \
>   --eosscore 0 \
>   --silscore 0 \
>   --unkscore 0 \
>   --smearing max
terminate called after throwing an instance of 'std::runtime_error'
  what():  Invalid dictionary filepath specified.
*** Aborted at 1593382447 (unix time) try "date -d @1593382447" if you are using GNU date ***
PC: @     0x7f1fe3ea8e97 gsignal
*** SIGABRT (@0xf8) received by PID 248 (TID 0x7f2029778380) from PID 248; stack trace: ***
    @     0x7f2021a8e890 (unknown)
    @     0x7f1fe3ea8e97 gsignal
    @     0x7f1fe3eaa801 abort
    @     0x7f1fe489d957 (unknown)
    @     0x7f1fe48a3ab6 (unknown)
    @     0x7f1fe48a3af1 std::terminate()
    @     0x7f1fe48a3d24 __cxa_throw
    @     0x555bff8278bd main
    @     0x7f1fe3e8bb97 __libc_start_main
    @     0x555bff882d2a _start
Aborted (core dumped)

test_decode.cfg is:

--emission_dir=/root/w2l/saved_models/pre-trained_acoustic_models          
--test=/root/w2l/lists/test-clean.lst
--maxload=10
--nthread_decoder=2
--show
--sholetters
--lexicon=/root/w2l/decoder/decoder-unigram-10000-nbest10.lexicon         
--uselexicon=true
--lm=/root/w2l/decoder/lm_librispeech_convlm_word_14B.bin
--lmtype=kenlm
--decodertype=wrd
--beamsize=100
--beamsizetoken=100
--beamthreshold=20

Any comments about the bug? Thank you!

tlikhomanenko commented 4 years ago

still add to the decoder config

--tokensdir /root/w2l/am
--tokens librispeech-train-all-unigram-10000.tokens

ML6634 commented 4 years ago

Thank @tlikhomanenko for pointing it out!

I added:

--tokensdir=/root/w2l/am
--tokens=librispeech-train-all-unigram-10000.tokens

into the config. Then, when I ran Decoder, I got an error associate with memory accessing:

root@a9a8aeab9c4c:~# wav2letter/build/Decoder   --flagsfile test_decode.cfg   --lmweight 1   --wordscore 0   --eosscore 0   --silscore 0   --unkscore 0   --smearing max
*** Aborted at 1593398213 (unix time) try "date -d @1593398213" if you are using GNU date ***
PC: @     0x55ef442d8248 fl::Module::param()
*** SIGSEGV (@0x8) received by PID 268 (TID 0x7f197f510380) from PID 8; stack trace: ***
    @     0x7f1977826890 (unknown)
    @     0x55ef442d8248 fl::Module::param()
    @     0x55ef43ff93ba main
    @     0x7f1939c23b97 __libc_start_main
    @     0x55ef44056d2a _start
Segmentation fault (core dumped)

tlikhomanenko commented 4 years ago

Please add empty set for am: --am=

ML6634 commented 4 years ago

Added it in. Unfortunately, the error stays the same.

ML6634 commented 4 years ago

So, any possible ways to take care of this? Thank you!

tlikhomanenko commented 4 years ago

Can you post the full log?

ML6634 commented 4 years ago

001_log:

Thank you!

tlikhomanenko commented 4 years ago

I suspect the problem in the random samples for which emissions are saved. Can you make the new list with 10 head samples from you original list. Run again Test but on this new list, specifying maxload=-1 and then Decode with this new list too and maxload=-1

ML6634 commented 4 years ago

My new list:

When I ran Test with the new list, I got an error:

root@cfcc3017cc69:~# wav2letter/build/Test \
>   --am /root/w2l/saved_models/pre-trained_acoustic_models/am_tds_ctc_librispeech_dev_clean.bin \
>   --maxload -1 \
>   --test /root/w2l/lists/new-test-clean.lst \
>   --tokensdir /root/w2l/am \
>   --tokens librispeech-train-all-unigram-10000.tokens \
>   --lexicon /root/w2l/am/librispeech-train+dev-unigram-10000-nbest10.lexicon \
>   --emission_dir /root/w2l/saved_models/pre-trained_acoustic_models
I0702 16:10:35.827715   187 Test.cpp:83] Gflags after parsing 
--flagfile=; --fromenv=; --tryfromenv=; --undefok=; --tab_completion_columns=80; --tab_completion_word=; --help=false; --helpfull=false; --helpmatch=; --helpon=; --helppackage=false; --helpshort=false; --helpxml=false; --version=false; --adambeta1=0.90000000000000002; --adambeta2=0.999; --am=/root/w2l/saved_models/pre-trained_acoustic_models/am_tds_ctc_librispeech_dev_clean.bin; --am_decoder_tr_dropout=0; --am_decoder_tr_layerdrop=0; --am_decoder_tr_layers=1; --arch=tds_do0.15_l5.6.10_mid3.0_ctc.arch2; --archdir=/private/home/qiantong/push_numbers/200M/do0.15_l5.6.10_mid3.0_incDO; --attention=content; --attentionthreshold=0; --attnWindow=no; --attnconvchannel=0; --attnconvkernel=0; --attndim=0; --batchsize=4; --beamsize=2500; --beamsizetoken=250000; --beamthreshold=25; --blobdata=false; --channels=1; --criterion=ctc; --critoptim=sgd; --datadir=; --dataorder=input; --decoderattnround=1; --decoderdropout=0; --decoderrnnlayer=1; --decodertype=wrd; --devwin=0; --emission_dir=/root/w2l/saved_models/pre-trained_acoustic_models; --emission_queue_size=3000; --enable_distributed=true; --encoderdim=0; --eosscore=0; --eostoken=false; --everstoredb=false; --fftcachesize=1; --filterbanks=80; --flagsfile=push_numbers/200M/do0.15_l5.6.10_mid3.0_incDO/100.cfg; --framesizems=30; --framestridems=10; --gamma=0.5; --gumbeltemperature=1; --input=flac; --inputbinsize=100; --inputfeeding=false; --isbeamdump=false; --iter=1500; --itersave=false; --labelsmooth=0; --leftWindowSize=50; --lexicon=/root/w2l/am/librispeech-train+dev-unigram-10000-nbest10.lexicon; --linlr=-1; --linlrcrit=-1; --linseg=0; --lm=; --lm_memory=5000; --lm_vocab=; --lmtype=kenlm; --lmweight=0; --localnrmlleftctx=0; --localnrmlrightctx=0; --logadd=false; --lr=0.29999999999999999; --lr_decay=9223372036854775807; --lr_decay_step=9223372036854775807; --lrcosine=false; --lrcrit=0; --max_devices_per_node=8; --maxdecoderoutputlen=200; --maxgradnorm=1; --maxisz=9223372036854775807; --maxload=-1; --maxrate=10; --maxsil=50; --maxtsz=9223372036854775807; --maxword=-1; --melfloor=1; --memstepsize=8338608; --mfcc=false; --mfcccoeffs=13; --mfsc=true; --minisz=0; --minrate=3; --minsil=0; --mintsz=0; --momentum=0.5; --netoptim=sgd; --noresample=false; --nthread=10; --nthread_decoder=1; --nthread_decoder_am_forward=1; --numattnhead=8; --onorm=target; --optimepsilon=1e-08; --optimrho=0.90000000000000002; --outputbinsize=5; --pctteacherforcing=100; --pcttraineval=100; --pow=false; --pretrainWindow=0; --replabel=0; --reportiters=0; --rightWindowSize=50; --rndv_filepath=/checkpoint/qiantong/ls_200M/do0.15_l5.6.10_mid3.0_incDO/100_rndv; --rundir=/checkpoint/qiantong/ls_200M/do0.15_l5.6.10_mid3.0_incDO; --runname=100; --samplerate=16000; --sampletarget=0; --samplingstrategy=rand; --saug_fmaskf=27; --saug_fmaskn=2; --saug_start_update=-1; --saug_tmaskn=2; --saug_tmaskp=1; --saug_tmaskt=100; --sclite=; --seed=2; --show=false; --showletters=false; --silscore=0; --smearing=none; --smoothingtemperature=1; --softwoffset=10; --softwrate=5; --softwstd=5; --sqnorm=true; --stepsize=200; --surround=; --tag=; --target=ltr; --test=/root/w2l/lists/new-test-clean.lst; --tokens=librispeech-train-all-unigram-10000.tokens; --tokensdir=/root/w2l/am; --train=/checkpoint/antares/datasets/librispeech/lists/train-clean-100.lst,/checkpoint/antares/datasets/librispeech/lists/train-clean-360.lst,/checkpoint/antares/datasets/librispeech/lists/train-other-500.lst; --trainWithWindow=false; --transdiag=0; --unkscore=-inf; --use_memcache=false; --uselexicon=true; --usewordpiece=true; --valid=librispeech/dev-other:/checkpoint/antares/datasets/librispeech/lists/dev-other.lst,librispeech/dev-clean:/checkpoint/antares/datasets/librispeech/lists/dev-clean.lst; --warmup=8000; --weightdecay=0; --wordscore=0; --wordseparator=_; --world_rank=0; --world_size=64; --alsologtoemail=; --alsologtostderr=false; --colorlogtostderr=false; --drop_log_memory=true; --log_backtrace_at=; --log_dir=; --log_link=; --log_prefix=true; --logbuflevel=0; --logbufsecs=30; --logemaillevel=999; --logfile_mode=436; --logmailer=/bin/mail; --logtostderr=true; --max_log_size=1800; --minloglevel=0; --stderrthreshold=2; --stop_logging_if_full_disk=false; --symbolize_stacktrace=true; --v=0; --vmodule=; 
I0702 16:10:35.830458   187 Test.cpp:104] Number of classes (network): 9998
I0702 16:10:36.297570   187 Test.cpp:111] Number of words: 89333
F0702 16:10:36.470782   187 W2lListFilesDataset.cpp:116] Cannot parse 
*** Check failure stack trace: ***
    @     0x7f2ecfb240cd  google::LogMessage::Fail()
    @     0x7f2ecfb25f33  google::LogMessage::SendToLog()
    @     0x7f2ecfb23c28  google::LogMessage::Flush()
    @     0x7f2ecfb26999  google::LogMessageFatal::~LogMessageFatal()
    @     0x563e4894fdcb  w2l::W2lListFilesDataset::loadListFile()
    @     0x563e4895092d  w2l::W2lListFilesDataset::W2lListFilesDataset()
    @     0x563e4897459e  w2l::createDataset()
    @     0x563e48773e00  main
    @     0x7f2ecee09b97  __libc_start_main
    @     0x563e487d081a  _start
Aborted (core dumped)

What is wrong with that? Thank you!

tlikhomanenko commented 4 years ago

Did you do head of file? or copy somehow else? your files cannot be parsed, which meas format is different from expected one. Make sure you have correct format as for your original list.

ML6634 commented 4 years ago

Thank @tlikhomanenko for the quick response! I chose 10 samples from:

Right now original Test with test-clean.lst works, but it does not work with the new list and maxload = -1. Even if I copy and paste the top 10 samples from test-clean.lst to make the new list, Test does work with the new list.

ML6634 commented 4 years ago

I checked a few times. It sounds the new list and the original list, test-clean.lst, have the same format (re: tabs, blank spaces, end of line markers etc.) .

tlikhomanenko commented 4 years ago

Can you try to remove emission dir and run it again?

tlikhomanenko commented 4 years ago

Also you can try with tail of the file. Just to debug

ML6634 commented 4 years ago

Thank @tlikhomanenko for the help and patience! The original list does not have an end-of-line marker right after the last sample, but I have. I fixed it, and Test went through. Unfortunately, after that when I ran Decoder with test_decode.cfg:

--emission_dir=/root/w2l/saved_models/pre-trained_acoustic_models
--tokensdir=/root/w2l/am
--tokens=librispeech-train-all-unigram-10000.tokens
--test=/root/w2l/lists/new-test-clean.lst
--maxload=-1
--nthread_decoder=2
--show
--sholetters
--lexicon=/root/w2l/decoder/decoder-unigram-10000-nbest10.lexicon
--uselexicon=true
--lm=/root/w2l/decoder/lm_librispeech_convlm_word_14B.bin
--lmtype=kenlm
--decodertype=wrd
--beamsize=100
--beamsizetoken=100
--beamthreshold=20

I got the same error as the above (See https://github.com/facebookresearch/wav2letter/issues/711#issuecomment-650875083):

root@cfcc3017cc69:~# wav2letter/build/Decoder \
>   --flagsfile wav2letter/build/test_decode.cfg \
>   --lmweight 1 \
>   --wordscore 0 \
>   --eosscore 0 \
>   --silscore 0 \
>   --unkscore 0 \
>   --smearing max
*** Aborted at 1593719110 (unix time) try "date -d @1593719110" if you are using GNU date ***
PC: @     0x560a4f4aa248 fl::Module::param()
*** SIGSEGV (@0x8) received by PID 311 (TID 0x7efd9471b380) from PID 8; stack trace: ***
    @     0x7efd8ca31890 (unknown)
    @     0x560a4f4aa248 fl::Module::param()
    @     0x560a4f1cb3ba main
    @     0x7efd4ee2eb97 __libc_start_main
    @     0x560a4f228d2a _start
Segmentation fault (core dumped)

tlikhomanenko commented 4 years ago

You are using convlm, not kenlm, so it should be

--lmtype=convlm

ML6634 commented 4 years ago

Thank @tlikhomanenko for the patience and help! On my new hard drive, I ran the Test on a list with 10 head samples from the original list, and it went through. With the change

--lmtype=convlm

in the decoder config:

--am=
--emission_dir=/root/w2l/pre-trained_acoustic_models
--tokensdir=/root/w2l/am
--tokens=librispeech-train-all-unigram-10000.tokens
--test=/root/w2l/lists/new-test-clean.lst
--maxload=-1
--nthread_decoder=2
--show
--sholetters
--lexicon=/root/w2l/decoder/decoder-unigram-10000-nbest10.lexicon
--uselexicon=true
--lm=/root/w2l/decoder/lm_librispeech_convlm_word_14B.bin
--lmtype=convlm
--decodertype=wrd
--beamsize=100
--beamsizetoken=100
--beamthreshold=20

the error stays the same when I ran Decoder.

tlikhomanenko commented 4 years ago

Can you post full screen output of your decoder run?

ML6634 commented 4 years ago

Thanks for asking! Here you go:

root@e90078050412:~# wav2letter/build/Decoder \
>   --flagsfile wav2letter/build/test_decode.cfg \
>   --lmweight 1 \
>   --wordscore 0 \
>   --eosscore 0 \
>   --silscore 0 \
>   --unkscore 0 \
>   --smearing max
*** Aborted at 1596343161 (unix time) try "date -d @1596343161" if you are using GNU date ***
PC: @     0x55cd76e8f528 (unknown)
*** SIGSEGV (@0x8) received by PID 120 (TID 0x7fadbb7c3380) from PID 8; stack trace: ***
    @     0x7fadb3ad9890 (unknown)
    @     0x55cd76e8f528 (unknown)
    @     0x55cd76bb3473 (unknown)
    @     0x7fad75ed6b97 __libc_start_main
    @     0x55cd76c13b2a (unknown)
Segmentation fault (core dumped)

ML6634 commented 4 years ago

Anyone who runs successfully the TDS CTC model or a similar wav2letter model on a machine with only 1 GPU (4 GB)? If so, how? Thank you!

tlikhomanenko commented 4 years ago

Can you add --minloglevel=0 --logtostderr=1 to see the full log on the screen (you should see for example data/model loading, as well as all flags printed on the screen)?

tlikhomanenko commented 4 years ago

And again, could you confirm that decode command is working for you in case of kenlm ngram model decoding (without saving emissions and with saved emissions)?

ML6634 commented 4 years ago

root@e90078050412:~# wav2letter/build/Decoder \
>   --flagsfile wav2letter/build/test_decode.cfg \
>   --minloglevel 0 \
>   --logtostderr 1 \
>   --lmweight 1 \
>   --wordscore 0 \
>   --eosscore 0 \
>   --silscore 0 \
>   --unkscore 0 \
>   --smearing max
I0805 09:07:37.017644   122 Decode.cpp:58] Reading flags from file wav2letter/build/test_decode.cfg
I0805 09:07:37.017838   122 Decode.cpp:106] Gflags after parsing 
--flagfile=; --fromenv=; --tryfromenv=; --undefok=; --tab_completion_columns=80; --tab_completion_word=; --help=false; --helpfull=false; --helpmatch=; --helpon=; --helppackage=false; --helpshort=false; --helpxml=false; --version=false; --adambeta1=0.90000000000000002; --adambeta2=0.999; --am=; --am_decoder_tr_dropout=0; --am_decoder_tr_layerdrop=0; --am_decoder_tr_layers=1; --arch=default; --archdir=; --attention=content; --attentionthreshold=2147483647; --attnWindow=no; --attnconvchannel=0; --attnconvkernel=0; --attndim=0; --batchsize=1; --beamsize=100; --beamsizetoken=100; --beamthreshold=20; --blobdata=false; --channels=1; --criterion=asg; --critoptim=sgd; --datadir=; --dataorder=input; --decoderattnround=1; --decoderdropout=0; --decoderrnnlayer=1; --decodertype=wrd; --devwin=0; --emission_dir=/root/w2l/pre-trained_acoustic_models; --emission_queue_size=3000; --enable_distributed=false; --encoderdim=0; --eosscore=0; --eostoken=false; --everstoredb=false; --fftcachesize=1; --filterbanks=40; --flagsfile=wav2letter/build/test_decode.cfg; --framesizems=25; --framestridems=10; --gamma=1; --gumbeltemperature=1; --input=flac; --inputbinsize=100; --inputfeeding=false; --isbeamdump=false; --iter=9223372036854775807; --itersave=false; --labelsmooth=0; --leftWindowSize=50; --lexicon=/root/w2l/decoder/decoder-unigram-10000-nbest10.lexicon; --linlr=-1; --linlrcrit=-1; --linseg=0; --lm=/root/w2l/decoder/lm_librispeech_convlm_word_14B.bin; --lm_memory=5000; --lm_vocab=; --lmtype=convlm; --lmweight=1; --localnrmlleftctx=0; --localnrmlrightctx=0; --logadd=false; --lr=1; --lr_decay=9223372036854775807; --lr_decay_step=9223372036854775807; --lrcosine=false; --lrcrit=0; --max_devices_per_node=8; --maxdecoderoutputlen=200; --maxgradnorm=0; --maxisz=9223372036854775807; --maxload=-1; --maxrate=10; --maxsil=50; --maxtsz=9223372036854775807; --maxword=-1; --melfloor=1; --memstepsize=10485760; --mfcc=false; --mfcccoeffs=13; --mfsc=false; --minisz=0; --minrate=3; --minsil=0; --mintsz=0; --momentum=0; --netoptim=sgd; --noresample=false; --nthread=1; --nthread_decoder=2; --nthread_decoder_am_forward=1; --numattnhead=8; --onorm=none; --optimepsilon=1e-08; --optimrho=0.90000000000000002; --outputbinsize=5; --pctteacherforcing=100; --pcttraineval=100; --pow=false; --pretrainWindow=0; --replabel=0; --reportiters=0; --rightWindowSize=50; --rndv_filepath=; --rundir=; --runname=; --samplerate=16000; --sampletarget=0; --samplingstrategy=rand; --saug_fmaskf=27; --saug_fmaskn=2; --saug_start_update=-1; --saug_tmaskn=2; --saug_tmaskp=1; --saug_tmaskt=100; --sclite=; --seed=0; --show=true; --showletters=false; --silscore=0; --smearing=max; --smoothingtemperature=1; --softwoffset=10; --softwrate=5; --softwstd=5; --sqnorm=false; --stepsize=9223372036854775807; --surround=; --tag=; --target=tkn; --test=/root/w2l/lists/new-test-clean.lst; --tokens=librispeech-train-all-unigram-10000.tokens; --tokensdir=/root/w2l/am; --train=; --trainWithWindow=false; --transdiag=0; --unkscore=0; --use_memcache=false; --uselexicon=true; --usewordpiece=false; --valid=; --warmup=1; --weightdecay=0; --wordscore=0; --wordseparator=|; --world_rank=0; --world_size=1; --alsologtoemail=; --alsologtostderr=false; --colorlogtostderr=false; --drop_log_memory=true; --log_backtrace_at=; --log_dir=; --log_link=; --log_prefix=true; --logbuflevel=0; --logbufsecs=30; --logemaillevel=999; --logfile_mode=436; --logmailer=/bin/mail; --logtostderr=true; --max_log_size=1800; --minloglevel=0; --stderrthreshold=2; --stop_logging_if_full_disk=false; --symbolize_stacktrace=true; --v=0; --vmodule=; 
I0805 09:07:37.020831   122 Decode.cpp:127] Number of classes (network): 9997
I0805 09:07:38.188450   122 Decode.cpp:134] Number of words: 200001
*** Aborted at 1596618458 (unix time) try "date -d @1596618458" if you are using GNU date ***
PC: @     0x5586ff2e0528 (unknown)
*** SIGSEGV (@0x8) received by PID 122 (TID 0x7f1d39b0c380) from PID 8; stack trace: ***
    @     0x7f1d31e22890 (unknown)
    @     0x5586ff2e0528 (unknown)
    @     0x5586ff004473 (unknown)
    @     0x7f1cf421fb97 __libc_start_main
    @     0x5586ff064b2a (unknown)
Segmentation fault (core dumped)

ML6634 commented 4 years ago

I ran successfully the model (so both train and decode commands went through):

E2E Speech Recognition on Librispeech-Clean Dataset https://github.com/facebookresearch/wav2letter/tree/master/tutorials/1-librispeech_clean

So, would you like me to try other decode commands? Thank you!

tlikhomanenko commented 4 years ago

I want you to run exactly the same decoding command s you running now but with kenlm model, not convlm. And also try without saving emissions.

Another necessary fixes for convlm config:

you need to specify also --lm_vocab for convlm, this is mapping between convlm ptrdictions indices and tokens.
Also --nthread_decoder=1 because you have only 1 GPU and convlm will be run on GPU.
Could you add https://github.com/facebookresearch/wav2letter/blob/master/Decode.cpp#L221 at this row LOG(INFO) << "Here"; and before this row https://github.com/facebookresearch/wav2letter/blob/master/Decode.cpp#L230 LOG(INFO) << "Init device";, recompile, run again and post log here?

ML6634 commented 4 years ago

Thank @tlikhomanenko for the kind help! Inside the Docker container, I modified Decode.cpp, as suggested by @tlikhomanenko, in the wav2letter folder. When I compiled it, I got:

root@3f9d46061788:~/wav2letter# g++ Decode.cpp
In file included from /usr/local/include/flashlight/autograd/Variable.h:25:0,
                 from /usr/local/include/flashlight/autograd/Utils.h:11,
                 from /usr/local/include/flashlight/autograd/autograd.h:20,
                 from /usr/local/include/flashlight/flashlight.h:11,
                 from Decode.cpp:17:
/usr/local/include/flashlight/common/Serialization.h:23:10: fatal error: cereal/access.hpp: No such file or directory
 #include <cereal/access.hpp>
          ^~~~~~~~~~~~~~~~~~~
compilation terminated.

root@3f9d46061788:~/wav2letter# gcc Decode.cpp
In file included from /usr/local/include/flashlight/autograd/Variable.h:25:0,
                 from /usr/local/include/flashlight/autograd/Utils.h:11,
                 from /usr/local/include/flashlight/autograd/autograd.h:20,
                 from /usr/local/include/flashlight/flashlight.h:11,
                 from Decode.cpp:17:
/usr/local/include/flashlight/common/Serialization.h:23:10: fatal error: cereal/access.hpp: No such file or directory
 #include <cereal/access.hpp>
          ^~~~~~~~~~~~~~~~~~~
compilation terminated.

Any comments? What is the right command to compile it? Thank you!

tlikhomanenko commented 4 years ago

You should follow this compilation steps

https://github.com/facebookresearch/wav2letter/blob/master/Dockerfile-CUDA#L22

ML6634 commented 4 years ago

To compile it, is it correct to run the following command in the wav2letter folder?

$ sudo docker build -f Dockerfile-CUDA --no-cache .

Thank you!

tlikhomanenko commented 4 years ago

you don't need to rebuild full image, just go to your container where you are running everything and there change code as I pointed and then do commands

export MKLROOT=/opt/intel/mkl && export KENLM_ROOT_DIR=/root/kenlm && \
    cd /root/wav2letter && mkdir -p build && \
    cd build && cmake .. -DCMAKE_BUILD_TYPE=Release -DW2L_LIBRARIES_USE_CUDA=ON -DW2L_BUILD_INFERENCE=ON && \
    make -j$(nproc)

ML6634 commented 4 years ago

Thank @tlikhomanenko for the continuous help! With the above changes, for the convlm model, I got:

root@0614c5da94b4:~# wav2letter/build/Decoder \
>   --flagsfile wav2letter/build/test_decode.cfg \
>   --minloglevel 0 \
>   --logtostderr 1 \
>   --lmweight 1 \
>   --wordscore 0 \
>   --eosscore 0 \
>   --silscore 0 \
>   --unkscore 0 \
>   --smearing max
I0808 20:32:21.031849  1014 Decode.cpp:58] Reading flags from file wav2letter/build/test_decode.cfg
I0808 20:32:21.032115  1014 Decode.cpp:106] Gflags after parsing 
--flagfile=; --fromenv=; --tryfromenv=; --undefok=; --tab_completion_columns=80; --tab_completion_word=; --help=false; --helpfull=false; --helpmatch=; --helpon=; --helppackage=false; --helpshort=false; --helpxml=false; --version=false; --adambeta1=0.90000000000000002; --adambeta2=0.999; --am=; --am_decoder_tr_dropout=0; --am_decoder_tr_layerdrop=0; --am_decoder_tr_layers=1; --arch=default; --archdir=; --attention=content; --attentionthreshold=2147483647; --attnWindow=no; --attnconvchannel=0; --attnconvkernel=0; --attndim=0; --batchsize=1; --beamsize=100; --beamsizetoken=100; --beamthreshold=20; --blobdata=false; --channels=1; --criterion=asg; --critoptim=sgd; --datadir=; --dataorder=input; --decoderattnround=1; --decoderdropout=0; --decoderrnnlayer=1; --decodertype=wrd; --devwin=0; --emission_dir=/root/w2l/pre-trained_acoustic_models; --emission_queue_size=3000; --enable_distributed=false; --encoderdim=0; --eosscore=0; --eostoken=false; --everstoredb=false; --fftcachesize=1; --filterbanks=40; --flagsfile=wav2letter/build/test_decode.cfg; --framesizems=25; --framestridems=10; --gamma=1; --gumbeltemperature=1; --input=flac; --inputbinsize=100; --inputfeeding=false; --isbeamdump=false; --iter=9223372036854775807; --itersave=false; --labelsmooth=0; --leftWindowSize=50; --lexicon=/root/w2l/decoder/decoder-unigram-10000-nbest10.lexicon; --linlr=-1; --linlrcrit=-1; --linseg=0; --lm=/root/w2l/decoder/lm_librispeech_convlm_word_14B.bin; --lm_memory=5000; --lm_vocab=/root/w2l/decoder/lm_librispeech_convlm_14B.vocab; --lmtype=convlm; --lmweight=1; --localnrmlleftctx=0; --localnrmlrightctx=0; --logadd=false; --lr=1; --lr_decay=9223372036854775807; --lr_decay_step=9223372036854775807; --lrcosine=false; --lrcrit=0; --max_devices_per_node=8; --maxdecoderoutputlen=200; --maxgradnorm=0; --maxisz=9223372036854775807; --maxload=-1; --maxrate=10; --maxsil=50; --maxtsz=9223372036854775807; --maxword=-1; --melfloor=1; --memstepsize=10485760; --mfcc=false; --mfcccoeffs=13; --mfsc=false; --minisz=0; --minrate=3; --minsil=0; --mintsz=0; --momentum=0; --netoptim=sgd; --noresample=false; --nthread=1; --nthread_decoder=1; --nthread_decoder_am_forward=1; --numattnhead=8; --onorm=none; --optimepsilon=1e-08; --optimrho=0.90000000000000002; --outputbinsize=5; --pctteacherforcing=100; --pcttraineval=100; --pow=false; --pretrainWindow=0; --replabel=0; --reportiters=0; --rightWindowSize=50; --rndv_filepath=; --rundir=; --runname=; --samplerate=16000; --sampletarget=0; --samplingstrategy=rand; --saug_fmaskf=27; --saug_fmaskn=2; --saug_start_update=-1; --saug_tmaskn=2; --saug_tmaskp=1; --saug_tmaskt=100; --sclite=; --seed=0; --show=true; --showletters=false; --silscore=0; --smearing=max; --smoothingtemperature=1; --softwoffset=10; --softwrate=5; --softwstd=5; --sqnorm=false; --stepsize=9223372036854775807; --surround=; --tag=; --target=tkn; --test=/root/w2l/lists/new-test-clean.lst; --tokens=librispeech-train-all-unigram-10000.tokens; --tokensdir=/root/w2l/am; --train=; --trainWithWindow=false; --transdiag=0; --unkscore=0; --use_memcache=false; --uselexicon=true; --usewordpiece=false; --valid=; --validbatchsize=-1; --warmup=1; --weightdecay=0; --wordscore=0; --wordseparator=|; --world_rank=0; --world_size=1; --alsologtoemail=; --alsologtostderr=false; --colorlogtostderr=false; --drop_log_memory=true; --log_backtrace_at=; --log_dir=; --log_link=; --log_prefix=true; --logbuflevel=0; --logbufsecs=30; --logemaillevel=999; --logfile_mode=436; --logmailer=/bin/mail; --logtostderr=true; --max_log_size=1800; --minloglevel=0; --stderrthreshold=2; --stop_logging_if_full_disk=false; --symbolize_stacktrace=true; --v=0; --vmodule=; 
I0808 20:32:21.035308  1014 Decode.cpp:127] Number of classes (network): 9997
I0808 20:32:22.087042  1014 Decode.cpp:134] Number of words: 200001
*** Aborted at 1596918742 (unix time) try "date -d @1596918742" if you are using GNU date ***
PC: @     0x557fd9df29c8 (unknown)
*** SIGSEGV (@0x8) received by PID 1014 (TID 0x7f3d357a2380) from PID 8; stack trace: ***
    @     0x7f3d2dab8890 (unknown)
    @     0x557fd9df29c8 (unknown)
    @     0x557fd9b0ab93 (unknown)
    @     0x7f3cefeb5b97 __libc_start_main
    @     0x557fd9b6ef3a (unknown)
Segmentation fault (core dumped)

ML6634 commented 4 years ago

Also, with the above changes, for the kenlm model, with saved emissions, I got:

root@0614c5da94b4:~# wav2letter/build/Decoder \
>   --flagsfile wav2letter/build/test_decode.cfg \
>   --minloglevel 0 \
>   --logtostderr 1 \
>   --lmweight 1 \
>   --wordscore 0 \
>   --eosscore 0 \
>   --silscore 0 \
>   --unkscore 0 \
>   --smearing max
I0808 20:36:01.440773  1017 Decode.cpp:58] Reading flags from file wav2letter/build/test_decode.cfg
I0808 20:36:01.441000  1017 Decode.cpp:106] Gflags after parsing 
--flagfile=; --fromenv=; --tryfromenv=; --undefok=; --tab_completion_columns=80; --tab_completion_word=; --help=false; --helpfull=false; --helpmatch=; --helpon=; --helppackage=false; --helpshort=false; --helpxml=false; --version=false; --adambeta1=0.90000000000000002; --adambeta2=0.999; --am=; --am_decoder_tr_dropout=0; --am_decoder_tr_layerdrop=0; --am_decoder_tr_layers=1; --arch=default; --archdir=; --attention=content; --attentionthreshold=2147483647; --attnWindow=no; --attnconvchannel=0; --attnconvkernel=0; --attndim=0; --batchsize=1; --beamsize=100; --beamsizetoken=100; --beamthreshold=20; --blobdata=false; --channels=1; --criterion=asg; --critoptim=sgd; --datadir=; --dataorder=input; --decoderattnround=1; --decoderdropout=0; --decoderrnnlayer=1; --decodertype=wrd; --devwin=0; --emission_dir=/root/w2l/pre-trained_acoustic_models; --emission_queue_size=3000; --enable_distributed=false; --encoderdim=0; --eosscore=0; --eostoken=false; --everstoredb=false; --fftcachesize=1; --filterbanks=40; --flagsfile=wav2letter/build/test_decode.cfg; --framesizems=25; --framestridems=10; --gamma=1; --gumbeltemperature=1; --input=flac; --inputbinsize=100; --inputfeeding=false; --isbeamdump=false; --iter=9223372036854775807; --itersave=false; --labelsmooth=0; --leftWindowSize=50; --lexicon=/root/w2l/decoder/decoder-unigram-10000-nbest10.lexicon; --linlr=-1; --linlrcrit=-1; --linseg=0; --lm=/root/w2l/decoder/lm_librispeech_convlm_word_14B.bin; --lm_memory=5000; --lm_vocab=/root/w2l/decoder/lm_librispeech_convlm_14B.vocab; --lmtype=kenlm; --lmweight=1; --localnrmlleftctx=0; --localnrmlrightctx=0; --logadd=false; --lr=1; --lr_decay=9223372036854775807; --lr_decay_step=9223372036854775807; --lrcosine=false; --lrcrit=0; --max_devices_per_node=8; --maxdecoderoutputlen=200; --maxgradnorm=0; --maxisz=9223372036854775807; --maxload=-1; --maxrate=10; --maxsil=50; --maxtsz=9223372036854775807; --maxword=-1; --melfloor=1; --memstepsize=10485760; --mfcc=false; --mfcccoeffs=13; --mfsc=false; --minisz=0; --minrate=3; --minsil=0; --mintsz=0; --momentum=0; --netoptim=sgd; --noresample=false; --nthread=1; --nthread_decoder=1; --nthread_decoder_am_forward=1; --numattnhead=8; --onorm=none; --optimepsilon=1e-08; --optimrho=0.90000000000000002; --outputbinsize=5; --pctteacherforcing=100; --pcttraineval=100; --pow=false; --pretrainWindow=0; --replabel=0; --reportiters=0; --rightWindowSize=50; --rndv_filepath=; --rundir=; --runname=; --samplerate=16000; --sampletarget=0; --samplingstrategy=rand; --saug_fmaskf=27; --saug_fmaskn=2; --saug_start_update=-1; --saug_tmaskn=2; --saug_tmaskp=1; --saug_tmaskt=100; --sclite=; --seed=0; --show=true; --showletters=false; --silscore=0; --smearing=max; --smoothingtemperature=1; --softwoffset=10; --softwrate=5; --softwstd=5; --sqnorm=false; --stepsize=9223372036854775807; --surround=; --tag=; --target=tkn; --test=/root/w2l/lists/new-test-clean.lst; --tokens=librispeech-train-all-unigram-10000.tokens; --tokensdir=/root/w2l/am; --train=; --trainWithWindow=false; --transdiag=0; --unkscore=0; --use_memcache=false; --uselexicon=true; --usewordpiece=false; --valid=; --validbatchsize=-1; --warmup=1; --weightdecay=0; --wordscore=0; --wordseparator=|; --world_rank=0; --world_size=1; --alsologtoemail=; --alsologtostderr=false; --colorlogtostderr=false; --drop_log_memory=true; --log_backtrace_at=; --log_dir=; --log_link=; --log_prefix=true; --logbuflevel=0; --logbufsecs=30; --logemaillevel=999; --logfile_mode=436; --logmailer=/bin/mail; --logtostderr=true; --max_log_size=1800; --minloglevel=0; --stderrthreshold=2; --stop_logging_if_full_disk=false; --symbolize_stacktrace=true; --v=0; --vmodule=; 
I0808 20:36:01.444080  1017 Decode.cpp:127] Number of classes (network): 9997
I0808 20:36:02.533449  1017 Decode.cpp:134] Number of words: 200001
*** Aborted at 1596918962 (unix time) try "date -d @1596918962" if you are using GNU date ***
PC: @     0x560c880929c8 (unknown)
*** SIGSEGV (@0x8) received by PID 1017 (TID 0x7f110128d380) from PID 8; stack trace: ***
    @     0x7f10f95a3890 (unknown)
    @     0x560c880929c8 (unknown)
    @     0x560c87daab93 (unknown)
    @     0x7f10bb9a0b97 __libc_start_main
    @     0x560c87e0ef3a (unknown)
Segmentation fault (core dumped)

ML6634 commented 4 years ago

For the kenlm model, with all those changes, but with the full path of --am added, and without saving emissions:

root@0614c5da94b4:~# wav2letter/build/Decoder \
>   --flagsfile wav2letter/build/test_decode.cfg \
>   --minloglevel 0 \
>   --logtostderr 1 \
>   --lmweight 1 \
>   --wordscore 0 \
>   --eosscore 0 \
>   --silscore 0 \
>   --unkscore 0 \
>   --smearing max
I0808 21:13:20.241331  1034 Decode.cpp:58] Reading flags from file wav2letter/build/test_decode.cfg
I0808 21:13:20.241497  1034 Decode.cpp:75] [Network] Reading acoustic model from /root/w2l/pre-trained_acoustic_models/am_tds_ctc_librispeech_dev_clean.bin
I0808 21:13:20.908205  1034 Decode.cpp:79] [Network] Sequential [input -> (0) -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> (12) -> (13) -> (14) -> (15) -> (16) -> (17) -> (18) -> (19) -> (20) -> (21) -> (22) -> (23) -> (24) -> (25) -> (26) -> (27) -> (28) -> (29) -> (30) -> (31) -> (32) -> (33) -> (34) -> (35) -> (36) -> (37) -> output]
    (0): SpecAugment ( W: 80, F: 27, mF: 2, T: 100, p: 1, mT: 2 )
    (1): View (-1 80 1 0)
    (2): Conv2D (1->10, 21x1, 2,1, SAME,SAME, 1, 1) (with bias)
    (3): ReLU
    (4): Dropout (0.000000)
    (5): LayerNorm ( axis : { 0 1 2 } , size : -1)
    (6): Time-Depth Separable Block (21, 240, 10) [800 -> 2400 -> 800]
    (7): Time-Depth Separable Block (21, 240, 10) [800 -> 2400 -> 800]
    (8): Time-Depth Separable Block (21, 240, 10) [800 -> 2400 -> 800]
    (9): Time-Depth Separable Block (21, 240, 10) [800 -> 2400 -> 800]
    (10): Time-Depth Separable Block (21, 240, 10) [800 -> 2400 -> 800]
    (11): Conv2D (10->14, 21x1, 2,1, SAME,SAME, 1, 1) (with bias)
    (12): ReLU
    (13): Dropout (0.000000)
    (14): LayerNorm ( axis : { 0 1 2 } , size : -1)
    (15): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120]
    (16): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120]
    (17): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120]
    (18): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120]
    (19): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120]
    (20): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120]
    (21): Conv2D (14->18, 21x1, 2,1, SAME,SAME, 1, 1) (with bias)
    (22): ReLU
    (23): Dropout (0.000000)
    (24): LayerNorm ( axis : { 0 1 2 } , size : -1)
    (25): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440]
    (26): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440]
    (27): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440]
    (28): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440]
    (29): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440]
    (30): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440]
    (31): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440]
    (32): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440]
    (33): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440]
    (34): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440]
    (35): View (0 1440 1 0)
    (36): Reorder (1,0,3,2)
    (37): Linear (1440->9998) (with bias)
I0808 21:13:20.908303  1034 Decode.cpp:82] [Criterion] ConnectionistTemporalClassificationCriterion
I0808 21:13:20.908306  1034 Decode.cpp:84] [Network] Number of params: 203394122
I0808 21:13:20.908358  1034 Decode.cpp:90] [Network] Updating flags from config file: /root/w2l/pre-trained_acoustic_models/am_tds_ctc_librispeech_dev_clean.bin
I0808 21:13:20.908658  1034 Decode.cpp:106] Gflags after parsing 
--flagfile=; --fromenv=; --tryfromenv=; --undefok=; --tab_completion_columns=80; --tab_completion_word=; --help=false; --helpfull=false; --helpmatch=; --helpon=; --helppackage=false; --helpshort=false; --helpxml=false; --version=false; --adambeta1=0.90000000000000002; --adambeta2=0.999; --am=/root/w2l/pre-trained_acoustic_models/am_tds_ctc_librispeech_dev_clean.bin; --am_decoder_tr_dropout=0; --am_decoder_tr_layerdrop=0; --am_decoder_tr_layers=1; --arch=tds_do0.15_l5.6.10_mid3.0_ctc.arch2; --archdir=/private/home/qiantong/push_numbers/200M/do0.15_l5.6.10_mid3.0_incDO; --attention=content; --attentionthreshold=0; --attnWindow=no; --attnconvchannel=0; --attnconvkernel=0; --attndim=0; --batchsize=4; --beamsize=100; --beamsizetoken=100; --beamthreshold=20; --blobdata=false; --channels=1; --criterion=ctc; --critoptim=sgd; --datadir=; --dataorder=input; --decoderattnround=1; --decoderdropout=0; --decoderrnnlayer=1; --decodertype=wrd; --devwin=0; --emission_dir=; --emission_queue_size=3000; --enable_distributed=true; --encoderdim=0; --eosscore=0; --eostoken=false; --everstoredb=false; --fftcachesize=1; --filterbanks=80; --flagsfile=wav2letter/build/test_decode.cfg; --framesizems=30; --framestridems=10; --gamma=0.5; --gumbeltemperature=1; --input=flac; --inputbinsize=100; --inputfeeding=false; --isbeamdump=false; --iter=1500; --itersave=false; --labelsmooth=0; --leftWindowSize=50; --lexicon=/root/w2l/decoder/decoder-unigram-10000-nbest10.lexicon; --linlr=-1; --linlrcrit=-1; --linseg=0; --lm=/root/w2l/decoder/lm_librispeech_convlm_word_14B.bin; --lm_memory=5000; --lm_vocab=/root/w2l/decoder/lm_librispeech_convlm_14B.vocab; --lmtype=kenlm; --lmweight=1; --localnrmlleftctx=0; --localnrmlrightctx=0; --logadd=false; --lr=0.29999999999999999; --lr_decay=9223372036854775807; --lr_decay_step=9223372036854775807; --lrcosine=false; --lrcrit=0; --max_devices_per_node=8; --maxdecoderoutputlen=200; --maxgradnorm=1; --maxisz=9223372036854775807; --maxload=-1; --maxrate=10; --maxsil=50; --maxtsz=9223372036854775807; --maxword=-1; --melfloor=1; --memstepsize=8338608; --mfcc=false; --mfcccoeffs=13; --mfsc=true; --minisz=0; --minrate=3; --minsil=0; --mintsz=0; --momentum=0.5; --netoptim=sgd; --noresample=false; --nthread=10; --nthread_decoder=1; --nthread_decoder_am_forward=1; --numattnhead=8; --onorm=target; --optimepsilon=1e-08; --optimrho=0.90000000000000002; --outputbinsize=5; --pctteacherforcing=100; --pcttraineval=100; --pow=false; --pretrainWindow=0; --replabel=0; --reportiters=0; --rightWindowSize=50; --rndv_filepath=/checkpoint/qiantong/ls_200M/do0.15_l5.6.10_mid3.0_incDO/100_rndv; --rundir=/checkpoint/qiantong/ls_200M/do0.15_l5.6.10_mid3.0_incDO; --runname=100; --samplerate=16000; --sampletarget=0; --samplingstrategy=rand; --saug_fmaskf=27; --saug_fmaskn=2; --saug_start_update=-1; --saug_tmaskn=2; --saug_tmaskp=1; --saug_tmaskt=100; --sclite=; --seed=2; --show=true; --showletters=false; --silscore=0; --smearing=max; --smoothingtemperature=1; --softwoffset=10; --softwrate=5; --softwstd=5; --sqnorm=true; --stepsize=200; --surround=; --tag=; --target=ltr; --test=/root/w2l/lists/new-test-clean.lst; --tokens=librispeech-train-all-unigram-10000.tokens; --tokensdir=/root/w2l/am; --train=/checkpoint/antares/datasets/librispeech/lists/train-clean-100.lst,/checkpoint/antares/datasets/librispeech/lists/train-clean-360.lst,/checkpoint/antares/datasets/librispeech/lists/train-other-500.lst; --trainWithWindow=false; --transdiag=0; --unkscore=0; --use_memcache=false; --uselexicon=true; --usewordpiece=true; --valid=librispeech/dev-other:/checkpoint/antares/datasets/librispeech/lists/dev-other.lst,librispeech/dev-clean:/checkpoint/antares/datasets/librispeech/lists/dev-clean.lst; --validbatchsize=-1; --warmup=1; --weightdecay=0; --wordscore=0; --wordseparator=_; --world_rank=0; --world_size=64; --alsologtoemail=; --alsologtostderr=false; --colorlogtostderr=false; --drop_log_memory=true; --log_backtrace_at=; --log_dir=; --log_link=; --log_prefix=true; --logbuflevel=0; --logbufsecs=30; --logemaillevel=999; --logfile_mode=436; --logmailer=/bin/mail; --logtostderr=true; --max_log_size=1800; --minloglevel=0; --stderrthreshold=2; --stop_logging_if_full_disk=false; --symbolize_stacktrace=true; --v=0; --vmodule=; 
I0808 21:13:20.911502  1034 Decode.cpp:127] Number of classes (network): 9998
I0808 21:13:21.963410  1034 Decode.cpp:134] Number of words: 200001
I0808 21:13:22.048540  1034 Decode.cpp:221] Here
Loading the LM will be faster if you build a binary file.
Reading /root/w2l/decoder/lm_librispeech_convlm_word_14B.bin
----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
terminate called after throwing an instance of 'util::EndOfFileException'
  what():  End of file Byte: 0
*** Aborted at 1596921202 (unix time) try "date -d @1596921202" if you are using GNU date ***
PC: @     0x7f4b77689e97 gsignal
*** SIGABRT (@0x40a) received by PID 1034 (TID 0x7f4bbcf59380) from PID 1034; stack trace: ***
    @     0x7f4bb526f890 (unknown)
    @     0x7f4b77689e97 gsignal
    @     0x7f4b7768b801 abort
    @     0x7f4b7807e957 (unknown)
    @     0x7f4b78084ab6 (unknown)
    @     0x7f4b78084af1 std::terminate()
    @     0x7f4b78084d79 __cxa_rethrow
    @     0x5625a8d4bf84 lm::ngram::detail::GenericModel<>::InitializeFromARPA()
    @     0x5625a8d4ddc5 lm::ngram::detail::GenericModel<>::GenericModel()
    @     0x5625a8d44fab lm::ngram::LoadVirtual()
    @     0x5625a8c6a649 w2l::KenLM::KenLM()
    @     0x5625a8b14480 main
    @     0x7f4b7766cb97 __libc_start_main
    @     0x5625a8b76f3a _start
Aborted (core dumped)

tlikhomanenko commented 4 years ago

For the kenlm model, with all those changes, but with the full path of --am added, and without saving emissions:

root@0614c5da94b4:~# wav2letter/build/Decoder \

--flagsfile wav2letter/build/test_decode.cfg \ --minloglevel 0 \ --logtostderr 1 \ --lmweight 1 \ --wordscore 0 \ --eosscore 0 \ --silscore 0 \ --unkscore 0 \ --smearing max I0808 21:13:20.241331 1034 Decode.cpp:58] Reading flags from file wav2letter/build/test_decode.cfg I0808 21:13:20.241497 1034 Decode.cpp:75] [Network] Reading acoustic model from /root/w2l/pre-trained_acoustic_models/am_tds_ctc_librispeech_dev_clean.bin I0808 21:13:20.908205 1034 Decode.cpp:79] [Network] Sequential [input -> (0) -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> (12) -> (13) -> (14) -> (15) -> (16) -> (17) -> (18) -> (19) -> (20) -> (21) -> (22) -> (23) -> (24) -> (25) -> (26) -> (27) -> (28) -> (29) -> (30) -> (31) -> (32) -> (33) -> (34) -> (35) -> (36) -> (37) -> output] (0): SpecAugment ( W: 80, F: 27, mF: 2, T: 100, p: 1, mT: 2 ) (1): View (-1 80 1 0) (2): Conv2D (1->10, 21x1, 2,1, SAME,SAME, 1, 1) (with bias) (3): ReLU (4): Dropout (0.000000) (5): LayerNorm ( axis : { 0 1 2 } , size : -1) (6): Time-Depth Separable Block (21, 240, 10) [800 -> 2400 -> 800] (7): Time-Depth Separable Block (21, 240, 10) [800 -> 2400 -> 800] (8): Time-Depth Separable Block (21, 240, 10) [800 -> 2400 -> 800] (9): Time-Depth Separable Block (21, 240, 10) [800 -> 2400 -> 800] (10): Time-Depth Separable Block (21, 240, 10) [800 -> 2400 -> 800] (11): Conv2D (10->14, 21x1, 2,1, SAME,SAME, 1, 1) (with bias) (12): ReLU (13): Dropout (0.000000) (14): LayerNorm ( axis : { 0 1 2 } , size : -1) (15): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120] (16): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120] (17): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120] (18): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120] (19): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120] (20): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120] (21): Conv2D (14->18, 21x1, 2,1, SAME,SAME, 1, 1) (with bias) (22): ReLU (23): Dropout (0.000000) (24): LayerNorm ( axis : { 0 1 2 } , size : -1) (25): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (26): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (27): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (28): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (29): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (30): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (31): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (32): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (33): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (34): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (35): View (0 1440 1 0) (36): Reorder (1,0,3,2) (37): Linear (1440->9998) (with bias) I0808 21:13:20.908303 1034 Decode.cpp:82] [Criterion] ConnectionistTemporalClassificationCriterion I0808 21:13:20.908306 1034 Decode.cpp:84] [Network] Number of params: 203394122 I0808 21:13:20.908358 1034 Decode.cpp:90] [Network] Updating flags from config file: /root/w2l/pre-trained_acoustic_models/am_tds_ctc_librispeech_dev_clean.bin I0808 21:13:20.908658 1034 Decode.cpp:106] Gflags after parsing --flagfile=; --fromenv=; --tryfromenv=; --undefok=; --tab_completion_columns=80; --tab_completion_word=; --help=false; --helpfull=false; --helpmatch=; --helpon=; --helppackage=false; --helpshort=false; --helpxml=false; --version=false; --adambeta1=0.90000000000000002; --adambeta2=0.999; --am=/root/w2l/pre-trained_acoustic_models/am_tds_ctc_librispeech_dev_clean.bin; --am_decoder_tr_dropout=0; --am_decoder_tr_layerdrop=0; --am_decoder_tr_layers=1; --arch=tds_do0.15_l5.6.10_mid3.0_ctc.arch2; --archdir=/private/home/qiantong/push_numbers/200M/do0.15_l5.6.10_mid3.0_incDO; --attention=content; --attentionthreshold=0; --attnWindow=no; --attnconvchannel=0; --attnconvkernel=0; --attndim=0; --batchsize=4; --beamsize=100; --beamsizetoken=100; --beamthreshold=20; --blobdata=false; --channels=1; --criterion=ctc; --critoptim=sgd; --datadir=; --dataorder=input; --decoderattnround=1; --decoderdropout=0; --decoderrnnlayer=1; --decodertype=wrd; --devwin=0; --emission_dir=; --emission_queue_size=3000; --enable_distributed=true; --encoderdim=0; --eosscore=0; --eostoken=false; --everstoredb=false; --fftcachesize=1; --filterbanks=80; --flagsfile=wav2letter/build/test_decode.cfg; --framesizems=30; --framestridems=10; --gamma=0.5; --gumbeltemperature=1; --input=flac; --inputbinsize=100; --inputfeeding=false; --isbeamdump=false; --iter=1500; --itersave=false; --labelsmooth=0; --leftWindowSize=50; --lexicon=/root/w2l/decoder/decoder-unigram-10000-nbest10.lexicon; --linlr=-1; --linlrcrit=-1; --linseg=0; --lm=/root/w2l/decoder/lm_librispeech_convlm_word_14B.bin; --lm_memory=5000; --lm_vocab=/root/w2l/decoder/lm_librispeech_convlm_14B.vocab; --lmtype=kenlm; --lmweight=1; --localnrmlleftctx=0; --localnrmlrightctx=0; --logadd=false; --lr=0.29999999999999999; --lr_decay=9223372036854775807; --lr_decay_step=9223372036854775807; --lrcosine=false; --lrcrit=0; --max_devices_per_node=8; --maxdecoderoutputlen=200; --maxgradnorm=1; --maxisz=9223372036854775807; --maxload=-1; --maxrate=10; --maxsil=50; --maxtsz=9223372036854775807; --maxword=-1; --melfloor=1; --memstepsize=8338608; --mfcc=false; --mfcccoeffs=13; --mfsc=true; --minisz=0; --minrate=3; --minsil=0; --mintsz=0; --momentum=0.5; --netoptim=sgd; --noresample=false; --nthread=10; --nthread_decoder=1; --nthread_decoder_am_forward=1; --numattnhead=8; --onorm=target; --optimepsilon=1e-08; --optimrho=0.90000000000000002; --outputbinsize=5; --pctteacherforcing=100; --pcttraineval=100; --pow=false; --pretrainWindow=0; --replabel=0; --reportiters=0; --rightWindowSize=50; --rndv_filepath=/checkpoint/qiantong/ls_200M/do0.15_l5.6.10_mid3.0_incDO/100_rndv; --rundir=/checkpoint/qiantong/ls_200M/do0.15_l5.6.10_mid3.0_incDO; --runname=100; --samplerate=16000; --sampletarget=0; --samplingstrategy=rand; --saug_fmaskf=27; --saug_fmaskn=2; --saug_start_update=-1; --saug_tmaskn=2; --saug_tmaskp=1; --saug_tmaskt=100; --sclite=; --seed=2; --show=true; --showletters=false; --silscore=0; --smearing=max; --smoothingtemperature=1; --softwoffset=10; --softwrate=5; --softwstd=5; --sqnorm=true; --stepsize=200; --surround=; --tag=; --target=ltr; --test=/root/w2l/lists/new-test-clean.lst; --tokens=librispeech-train-all-unigram-10000.tokens; --tokensdir=/root/w2l/am; --train=/checkpoint/antares/datasets/librispeech/lists/train-clean-100.lst,/checkpoint/antares/datasets/librispeech/lists/train-clean-360.lst,/checkpoint/antares/datasets/librispeech/lists/train-other-500.lst; --trainWithWindow=false; --transdiag=0; --unkscore=0; --usememcache=false; --uselexicon=true; --usewordpiece=true; --valid=librispeech/dev-other:/checkpoint/antares/datasets/librispeech/lists/dev-other.lst,librispeech/dev-clean:/checkpoint/antares/datasets/librispeech/lists/dev-clean.lst; --validbatchsize=-1; --warmup=1; --weightdecay=0; --wordscore=0; --wordseparator=; --world_rank=0; --world_size=64; --alsologtoemail=; --alsologtostderr=false; --colorlogtostderr=false; --drop_log_memory=true; --log_backtrace_at=; --log_dir=; --log_link=; --log_prefix=true; --logbuflevel=0; --logbufsecs=30; --logemaillevel=999; --logfile_mode=436; --logmailer=/bin/mail; --logtostderr=true; --max_log_size=1800; --minloglevel=0; --stderrthreshold=2; --stop_logging_if_full_disk=false; --symbolize_stacktrace=true; --v=0; --vmodule=; I0808 21:13:20.911502 1034 Decode.cpp:127] Number of classes (network): 9998 I0808 21:13:21.963410 1034 Decode.cpp:134] Number of words: 200001 I0808 21:13:22.048540 1034 Decode.cpp:221] Here Loading the LM will be faster if you build a binary file. Reading /root/w2l/decoder/lm_librispeech_convlm_word_14B.bin ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100 terminate called after throwing an instance of 'util::EndOfFileException' what(): End of file Byte: 0 Aborted at 1596921202 (unix time) try "date -d @1596921202" if you are using GNU date PC: @ 0x7f4b77689e97 gsignal SIGABRT (@0x40a) received by PID 1034 (TID 0x7f4bbcf59380) from PID 1034; stack trace: @ 0x7f4bb526f890 (unknown) @ 0x7f4b77689e97 gsignal @ 0x7f4b7768b801 abort @ 0x7f4b7807e957 (unknown) @ 0x7f4b78084ab6 (unknown) @ 0x7f4b78084af1 std::terminate() @ 0x7f4b78084d79 __cxa_rethrow @ 0x5625a8d4bf84 lm::ngram::detail::GenericModel<>::InitializeFromARPA() @ 0x5625a8d4ddc5 lm::ngram::detail::GenericModel<>::GenericModel() @ 0x5625a8d44fab lm::ngram::LoadVirtual() @ 0x5625a8c6a649 w2l::KenLM::KenLM() @ 0x5625a8b14480 main @ 0x7f4b7766cb97 __libc_start_main @ 0x5625a8b76f3a _start Aborted (core dumped)

Why you are setting convlm bin model to the lm path while you are using kenlm? Please be care when provides flags for your run.

Thank @tlikhomanenko for the continuous help! With the above changes, for the convlm model, I got:

root@0614c5da94b4:~# wav2letter/build/Decoder \

--flagsfile wav2letter/build/test_decode.cfg \ --minloglevel 0 \ --logtostderr 1 \ --lmweight 1 \ --wordscore 0 \ --eosscore 0 \ --silscore 0 \ --unkscore 0 \ --smearing max I0808 20:32:21.031849 1014 Decode.cpp:58] Reading flags from file wav2letter/build/test_decode.cfg I0808 20:32:21.032115 1014 Decode.cpp:106] Gflags after parsing --flagfile=; --fromenv=; --tryfromenv=; --undefok=; --tab_completion_columns=80; --tab_completion_word=; --help=false; --helpfull=false; --helpmatch=; --helpon=; --helppackage=false; --helpshort=false; --helpxml=false; --version=false; --adambeta1=0.90000000000000002; --adambeta2=0.999; --am=; --am_decoder_tr_dropout=0; --am_decoder_tr_layerdrop=0; --am_decoder_tr_layers=1; --arch=default; --archdir=; --attention=content; --attentionthreshold=2147483647; --attnWindow=no; --attnconvchannel=0; --attnconvkernel=0; --attndim=0; --batchsize=1; --beamsize=100; --beamsizetoken=100; --beamthreshold=20; --blobdata=false; --channels=1; --criterion=asg; --critoptim=sgd; --datadir=; --dataorder=input; --decoderattnround=1; --decoderdropout=0; --decoderrnnlayer=1; --decodertype=wrd; --devwin=0; --emission_dir=/root/w2l/pre-trained_acoustic_models; --emission_queue_size=3000; --enable_distributed=false; --encoderdim=0; --eosscore=0; --eostoken=false; --everstoredb=false; --fftcachesize=1; --filterbanks=40; --flagsfile=wav2letter/build/test_decode.cfg; --framesizems=25; --framestridems=10; --gamma=1; --gumbeltemperature=1; --input=flac; --inputbinsize=100; --inputfeeding=false; --isbeamdump=false; --iter=9223372036854775807; --itersave=false; --labelsmooth=0; --leftWindowSize=50; --lexicon=/root/w2l/decoder/decoder-unigram-10000-nbest10.lexicon; --linlr=-1; --linlrcrit=-1; --linseg=0; --lm=/root/w2l/decoder/lm_librispeech_convlm_word_14B.bin; --lm_memory=5000; --lm_vocab=/root/w2l/decoder/lm_librispeech_convlm_14B.vocab; --lmtype=convlm; --lmweight=1; --localnrmlleftctx=0; --localnrmlrightctx=0; --logadd=false; --lr=1; --lr_decay=9223372036854775807; --lr_decay_step=9223372036854775807; --lrcosine=false; --lrcrit=0; --max_devices_per_node=8; --maxdecoderoutputlen=200; --maxgradnorm=0; --maxisz=9223372036854775807; --maxload=-1; --maxrate=10; --maxsil=50; --maxtsz=9223372036854775807; --maxword=-1; --melfloor=1; --memstepsize=10485760; --mfcc=false; --mfcccoeffs=13; --mfsc=false; --minisz=0; --minrate=3; --minsil=0; --mintsz=0; --momentum=0; --netoptim=sgd; --noresample=false; --nthread=1; --nthread_decoder=1; --nthread_decoder_am_forward=1; --numattnhead=8; --onorm=none; --optimepsilon=1e-08; --optimrho=0.90000000000000002; --outputbinsize=5; --pctteacherforcing=100; --pcttraineval=100; --pow=false; --pretrainWindow=0; --replabel=0; --reportiters=0; --rightWindowSize=50; --rndv_filepath=; --rundir=; --runname=; --samplerate=16000; --sampletarget=0; --samplingstrategy=rand; --saug_fmaskf=27; --saug_fmaskn=2; --saug_start_update=-1; --saug_tmaskn=2; --saug_tmaskp=1; --saug_tmaskt=100; --sclite=; --seed=0; --show=true; --showletters=false; --silscore=0; --smearing=max; --smoothingtemperature=1; --softwoffset=10; --softwrate=5; --softwstd=5; --sqnorm=false; --stepsize=9223372036854775807; --surround=; --tag=; --target=tkn; --test=/root/w2l/lists/new-test-clean.lst; --tokens=librispeech-train-all-unigram-10000.tokens; --tokensdir=/root/w2l/am; --train=; --trainWithWindow=false; --transdiag=0; --unkscore=0; --use_memcache=false; --uselexicon=true; --usewordpiece=false; --valid=; --validbatchsize=-1; --warmup=1; --weightdecay=0; --wordscore=0; --wordseparator=|; --world_rank=0; --world_size=1; --alsologtoemail=; --alsologtostderr=false; --colorlogtostderr=false; --drop_log_memory=true; --log_backtrace_at=; --log_dir=; --log_link=; --log_prefix=true; --logbuflevel=0; --logbufsecs=30; --logemaillevel=999; --logfile_mode=436; --logmailer=/bin/mail; --logtostderr=true; --max_log_size=1800; --minloglevel=0; --stderrthreshold=2; --stop_logging_if_full_disk=false; --symbolize_stacktrace=true; --v=0; --vmodule=; I0808 20:32:21.035308 1014 Decode.cpp:127] Number of classes (network): 9997 I0808 20:32:22.087042 1014 Decode.cpp:134] Number of words: 200001 Aborted at 1596918742 (unix time) try "date -d @1596918742" if you are using GNU date PC: @ 0x557fd9df29c8 (unknown) SIGSEGV (@0x8) received by PID 1014 (TID 0x7f3d357a2380) from PID 8; stack trace: @ 0x7f3d2dab8890 (unknown) @ 0x557fd9df29c8 (unknown) @ 0x557fd9b0ab93 (unknown) @ 0x7f3cefeb5b97 __libc_start_main @ 0x557fd9b6ef3a (unknown) Segmentation fault (core dumped)

Seems you didn't recompile. Just to be sure, add also LOG(INFO) << "Here begin"; on the row https://github.com/facebookresearch/wav2letter/blob/master/Decode.cpp#L136, compile and run again.

ML6634 commented 4 years ago

I added those three statements to Decode.cpp, and the build went through:

...
[ 87%] Built target inference_IdentityTest
[ 87%] Built target inference_Conv1dTest
[ 88%] Built target inference_LayerNormTest
[ 90%] Built target inference_LinearTest
[ 91%] Built target audio_to_words_example
[ 93%] Built target inference_TDSBlockTest
[ 94%] Built target multithreaded_streaming_asr_example
[ 96%] Built target interactive_streaming_asr_example
[ 96%] Built target simple_streaming_asr_example
[ 97%] Built target wav2letter-inference
[ 97%] Linking CXX executable Decoder
[100%] Built target Decoder

For the convlm model, I got:

root@73b1494e425d:~# wav2letter/build/Decoder \
>   --flagsfile wav2letter/build/test_decode.cfg \
>   --minloglevel 0 \
>   --logtostderr 1 \
>   --lmweight 1 \
>   --wordscore 0 \
>   --eosscore 0 \
>   --silscore 0 \
>   --unkscore 0 \
>   --smearing max
I0816 23:30:44.348644  2278 Decode.cpp:58] Reading flags from file wav2letter/build/test_decode.cfg
I0816 23:30:44.507551  2278 Decode.cpp:75] [Network] Reading acoustic model from /root/w2l/pre-trained_acoustic_models/am_tds_ctc_librispeech_dev_clean.bin
I0816 23:33:30.168771  2278 Decode.cpp:79] [Network] Sequential [input -> (0) -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> (12) -> (13) -> (14) -> (15) -> (16) -> (17) -> (18) -> (19) -> (20) -> (21) -> (22) -> (23) -> (24) -> (25) -> (26) -> (27) -> (28) -> (29) -> (30) -> (31) -> (32) -> (33) -> (34) -> (35) -> (36) -> (37) -> output]
    (0): SpecAugment ( W: 80, F: 27, mF: 2, T: 100, p: 1, mT: 2 )
    (1): View (-1 80 1 0)
    (2): Conv2D (1->10, 21x1, 2,1, SAME,SAME, 1, 1) (with bias)
    (3): ReLU
    (4): Dropout (0.000000)
    (5): LayerNorm ( axis : { 0 1 2 } , size : -1)
    (6): Time-Depth Separable Block (21, 240, 10) [800 -> 2400 -> 800]
    (7): Time-Depth Separable Block (21, 240, 10) [800 -> 2400 -> 800]
    (8): Time-Depth Separable Block (21, 240, 10) [800 -> 2400 -> 800]
    (9): Time-Depth Separable Block (21, 240, 10) [800 -> 2400 -> 800]
    (10): Time-Depth Separable Block (21, 240, 10) [800 -> 2400 -> 800]
    (11): Conv2D (10->14, 21x1, 2,1, SAME,SAME, 1, 1) (with bias)
    (12): ReLU
    (13): Dropout (0.000000)
    (14): LayerNorm ( axis : { 0 1 2 } , size : -1)
    (15): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120]
    (16): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120]
    (17): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120]
    (18): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120]
    (19): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120]
    (20): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120]
    (21): Conv2D (14->18, 21x1, 2,1, SAME,SAME, 1, 1) (with bias)
    (22): ReLU
    (23): Dropout (0.000000)
    (24): LayerNorm ( axis : { 0 1 2 } , size : -1)
    (25): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440]
    (26): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440]
    (27): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440]
    (28): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440]
    (29): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440]
    (30): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440]
    (31): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440]
    (32): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440]
    (33): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440]
    (34): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440]
    (35): View (0 1440 1 0)
    (36): Reorder (1,0,3,2)
    (37): Linear (1440->9998) (with bias)
I0816 23:33:30.169116  2278 Decode.cpp:82] [Criterion] ConnectionistTemporalClassificationCriterion
I0816 23:33:30.169129  2278 Decode.cpp:84] [Network] Number of params: 203394122
I0816 23:33:30.169189  2278 Decode.cpp:90] [Network] Updating flags from config file: /root/w2l/pre-trained_acoustic_models/am_tds_ctc_librispeech_dev_clean.bin
I0816 23:33:30.170029  2278 Decode.cpp:106] Gflags after parsing 
--flagfile=; --fromenv=; --tryfromenv=; --undefok=; --tab_completion_columns=80; --tab_completion_word=; --help=false; --helpfull=false; --helpmatch=; --helpon=; --helppackage=false; --helpshort=false; --helpxml=false; --version=false; --adambeta1=0.90000000000000002; --adambeta2=0.999; --am=/root/w2l/pre-trained_acoustic_models/am_tds_ctc_librispeech_dev_clean.bin; --am_decoder_tr_dropout=0; --am_decoder_tr_layerdrop=0; --am_decoder_tr_layers=1; --arch=tds_do0.15_l5.6.10_mid3.0_ctc.arch2; --archdir=/private/home/qiantong/push_numbers/200M/do0.15_l5.6.10_mid3.0_incDO; --attention=content; --attentionthreshold=0; --attnWindow=no; --attnconvchannel=0; --attnconvkernel=0; --attndim=0; --batchsize=4; --beamsize=100; --beamsizetoken=100; --beamthreshold=20; --blobdata=false; --channels=1; --criterion=ctc; --critoptim=sgd; --datadir=; --dataorder=input; --decoderattnround=1; --decoderdropout=0; --decoderrnnlayer=1; --decodertype=wrd; --devwin=0; --emission_dir=/root/w2l/pre-trained_acoustic_models; --emission_queue_size=3000; --enable_distributed=true; --encoderdim=0; --eosscore=0; --eostoken=false; --everstoredb=false; --fftcachesize=1; --filterbanks=80; --flagsfile=wav2letter/build/test_decode.cfg; --framesizems=30; --framestridems=10; --gamma=0.5; --gumbeltemperature=1; --input=flac; --inputbinsize=100; --inputfeeding=false; --isbeamdump=false; --iter=1500; --itersave=false; --labelsmooth=0; --leftWindowSize=50; --lexicon=/root/w2l/decoder/decoder-unigram-10000-nbest10.lexicon; --linlr=-1; --linlrcrit=-1; --linseg=0; --lm=/root/w2l/decoder/lm_librispeech_convlm_word_14B.bin; --lm_memory=5000; --lm_vocab=/root/w2l/decoder/lm_librispeech_convlm_14B.vocab; --lmtype=convlm; --lmweight=1; --localnrmlleftctx=0; --localnrmlrightctx=0; --logadd=false; --lr=0.29999999999999999; --lr_decay=9223372036854775807; --lr_decay_step=9223372036854775807; --lrcosine=false; --lrcrit=0; --max_devices_per_node=8; --maxdecoderoutputlen=200; --maxgradnorm=1; --maxisz=9223372036854775807; --maxload=-1; --maxrate=10; --maxsil=50; --maxtsz=9223372036854775807; --maxword=-1; --melfloor=1; --memstepsize=8338608; --mfcc=false; --mfcccoeffs=13; --mfsc=true; --minisz=0; --minrate=3; --minsil=0; --mintsz=0; --momentum=0.5; --netoptim=sgd; --noresample=false; --nthread=10; --nthread_decoder=1; --nthread_decoder_am_forward=1; --numattnhead=8; --onorm=target; --optimepsilon=1e-08; --optimrho=0.90000000000000002; --outputbinsize=5; --pctteacherforcing=100; --pcttraineval=100; --pow=false; --pretrainWindow=0; --replabel=0; --reportiters=0; --rightWindowSize=50; --rndv_filepath=/checkpoint/qiantong/ls_200M/do0.15_l5.6.10_mid3.0_incDO/100_rndv; --rundir=/checkpoint/qiantong/ls_200M/do0.15_l5.6.10_mid3.0_incDO; --runname=100; --samplerate=16000; --sampletarget=0; --samplingstrategy=rand; --saug_fmaskf=27; --saug_fmaskn=2; --saug_start_update=-1; --saug_tmaskn=2; --saug_tmaskp=1; --saug_tmaskt=100; --sclite=; --seed=2; --show=true; --showletters=false; --silscore=0; --smearing=max; --smoothingtemperature=1; --softwoffset=10; --softwrate=5; --softwstd=5; --sqnorm=true; --stepsize=200; --surround=; --tag=; --target=ltr; --test=/root/w2l/lists/new-test-clean.lst; --tokens=librispeech-train-all-unigram-10000.tokens; --tokensdir=/root/w2l/am; --train=/checkpoint/antares/datasets/librispeech/lists/train-clean-100.lst,/checkpoint/antares/datasets/librispeech/lists/train-clean-360.lst,/checkpoint/antares/datasets/librispeech/lists/train-other-500.lst; --trainWithWindow=false; --transdiag=0; --unkscore=0; --use_memcache=false; --uselexicon=true; --usewordpiece=true; --valid=librispeech/dev-other:/checkpoint/antares/datasets/librispeech/lists/dev-other.lst,librispeech/dev-clean:/checkpoint/antares/datasets/librispeech/lists/dev-clean.lst; --validbatchsize=-1; --warmup=1; --weightdecay=0; --wordscore=0; --wordseparator=_; --world_rank=0; --world_size=64; --alsologtoemail=; --alsologtostderr=false; --colorlogtostderr=false; --drop_log_memory=true; --log_backtrace_at=; --log_dir=; --log_link=; --log_prefix=true; --logbuflevel=0; --logbufsecs=30; --logemaillevel=999; --logfile_mode=436; --logmailer=/bin/mail; --logtostderr=true; --max_log_size=1800; --minloglevel=0; --stderrthreshold=2; --stop_logging_if_full_disk=false; --symbolize_stacktrace=true; --v=0; --vmodule=; 
I0816 23:33:30.869755  2278 Decode.cpp:127] Number of classes (network): 9998
I0816 23:33:51.396348  2278 Decode.cpp:134] Number of words: 200001
I0816 23:33:51.396366  2278 Decode.cpp:136] Here begin
I0816 23:33:51.493278  2278 Decode.cpp:221] Here
I0816 23:33:51.493304  2278 Decode.cpp:230] Init device
I0816 23:33:51.493319  2278 Decode.cpp:232] [ConvLM]: Loading LM from /root/w2l/decoder/lm_librispeech_convlm_word_14B.bin
E0816 23:33:52.454282  2278 Serial.h:77] Error while loading "/root/w2l/decoder/lm_librispeech_convlm_word_14B.bin": Failed to read 8 bytes from input stream! Read 0
E0816 23:33:53.454519  2278 Serial.h:77] Error while loading "/root/w2l/decoder/lm_librispeech_convlm_word_14B.bin": Failed to read 8 bytes from input stream! Read 0
E0816 23:33:55.454686  2278 Serial.h:77] Error while loading "/root/w2l/decoder/lm_librispeech_convlm_word_14B.bin": Failed to read 8 bytes from input stream! Read 0
E0816 23:33:59.454841  2278 Serial.h:77] Error while loading "/root/w2l/decoder/lm_librispeech_convlm_word_14B.bin": Failed to read 8 bytes from input stream! Read 0
E0816 23:34:07.455020  2278 Serial.h:77] Error while loading "/root/w2l/decoder/lm_librispeech_convlm_word_14B.bin": Failed to read 8 bytes from input stream! Read 0
E0816 23:34:23.455188  2278 Serial.h:77] Error while loading "/root/w2l/decoder/lm_librispeech_convlm_word_14B.bin": Failed to read 8 bytes from input stream! Read 0
terminate called after throwing an instance of 'cereal::Exception'
  what():  Failed to read 8 bytes from input stream! Read 0
*** Aborted at 1597620863 (unix time) try "date -d @1597620863" if you are using GNU date ***
PC: @     0x7f6daa0f4e97 gsignal
*** SIGABRT (@0x8e6) received by PID 2278 (TID 0x7f6def9c4380) from PID 2278; stack trace: ***
    @     0x7f6de7cda890 (unknown)
    @     0x7f6daa0f4e97 gsignal
    @     0x7f6daa0f6801 abort
    @     0x7f6daaae9957 (unknown)
    @     0x7f6daaaefab6 (unknown)
    @     0x7f6daaaefaf1 std::terminate()
    @     0x7f6daaaefd79 __cxa_rethrow
    @     0x55701c0dc260 _ZN3w2l16retryWithBackoffIRFvRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERSt10shared_ptrIN2fl6ModuleEEEJS8_SD_EEENSt9result_ofIFT_DpT0_EE4typeENSt6chrono8durationIdSt5ratioILl1ELl1EEEEdlOSH_DpOSI_
    @     0x55701c05c245 main
    @     0x7f6daa0d7b97 __libc_start_main
    @     0x55701c0c000a _start
Aborted (core dumped)

tlikhomanenko commented 4 years ago

Why did you previously have another error when using convlm? It wasn't cereal error.

ML6634 commented 4 years ago

Good question! Yesterday, I ran the docker image with CUDA in a new container, and got in the container, then ran Train, Test, and Decode by coping and pasting the code from my record into the container. Actually, I did the same process quite a few times, and each time the bug for Decode stayed the same except that yesterday it produced a cereal bug instead. For the most two recent executions of Decode for the convlm model (i.e., https://github.com/facebookresearch/wav2letter/issues/711#issuecomment-670971073 and https://github.com/facebookresearch/wav2letter/issues/711#issuecomment-674593031), I added --lm_vocab and changed --nthread_decoder to 1 in the Decode config file per the suggestion of @tlikhomanenko.

Anyway to get rid of the cereal bug? Thank you!!

tlikhomanenko commented 4 years ago

I have no idea what are you doing/changing. We didn't change anything that could affect the reading convlm model bin, so you should not have a cereal bug.

ML6634 commented 4 years ago

It sounds that something is wrong with my file: lm_librispeech_convlm_word_14B.bin. So, I re-downloaded it, and it fixed the cereal error. Then, I got:

root@5861d058cec8:~# wav2letter/build/Decoder \
>   --flagsfile wav2letter/build/test_decode.cfg \
>   --minloglevel 0 \
>   --logtostderr 1 \
>   --lmweight 1 \
>   --wordscore 0 \
>   --eosscore 0 \
>   --silscore 0 \
>   --unkscore 0 \
>   --smearing max
I0823 22:16:34.447176  1083 Decode.cpp:58] Reading flags from file wav2letter/build/test_decode.cfg
I0823 22:16:34.447362  1083 Decode.cpp:75] [Network] Reading acoustic model from /root/w2l/pre-trained_acoustic_models/am_tds_ctc_librispeech_dev_clean.bin
I0823 22:16:35.110081  1083 Decode.cpp:79] [Network] Sequential [input -> (0) -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> (12) -> (13) -> (14) -> (15) -> (16) -> (17) -> (18) -> (19) -> (20) -> (21) -> (22) -> (23) -> (24) -> (25) -> (26) -> (27) -> (28) -> (29) -> (30) -> (31) -> (32) -> (33) -> (34) -> (35) -> (36) -> (37) -> output]
    (0): SpecAugment ( W: 80, F: 27, mF: 2, T: 100, p: 1, mT: 2 )
    (1): View (-1 80 1 0)
    (2): Conv2D (1->10, 21x1, 2,1, SAME,SAME, 1, 1) (with bias)
    (3): ReLU
    (4): Dropout (0.000000)
    (5): LayerNorm ( axis : { 0 1 2 } , size : -1)
    (6): Time-Depth Separable Block (21, 240, 10) [800 -> 2400 -> 800]
    (7): Time-Depth Separable Block (21, 240, 10) [800 -> 2400 -> 800]
    (8): Time-Depth Separable Block (21, 240, 10) [800 -> 2400 -> 800]
    (9): Time-Depth Separable Block (21, 240, 10) [800 -> 2400 -> 800]
    (10): Time-Depth Separable Block (21, 240, 10) [800 -> 2400 -> 800]
    (11): Conv2D (10->14, 21x1, 2,1, SAME,SAME, 1, 1) (with bias)
    (12): ReLU
    (13): Dropout (0.000000)
    (14): LayerNorm ( axis : { 0 1 2 } , size : -1)
    (15): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120]
    (16): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120]
    (17): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120]
    (18): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120]
    (19): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120]
    (20): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120]
    (21): Conv2D (14->18, 21x1, 2,1, SAME,SAME, 1, 1) (with bias)
    (22): ReLU
    (23): Dropout (0.000000)
    (24): LayerNorm ( axis : { 0 1 2 } , size : -1)
    (25): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440]
    (26): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440]
    (27): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440]
    (28): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440]
    (29): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440]
    (30): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440]
    (31): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440]
    (32): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440]
    (33): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440]
    (34): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440]
    (35): View (0 1440 1 0)
    (36): Reorder (1,0,3,2)
    (37): Linear (1440->9998) (with bias)
I0823 22:16:35.110160  1083 Decode.cpp:82] [Criterion] ConnectionistTemporalClassificationCriterion
I0823 22:16:35.110162  1083 Decode.cpp:84] [Network] Number of params: 203394122
I0823 22:16:35.110174  1083 Decode.cpp:90] [Network] Updating flags from config file: /root/w2l/pre-trained_acoustic_models/am_tds_ctc_librispeech_dev_clean.bin
I0823 22:16:35.110532  1083 Decode.cpp:106] Gflags after parsing 
--flagfile=; --fromenv=; --tryfromenv=; --undefok=; --tab_completion_columns=80; --tab_completion_word=; --help=false; --helpfull=false; --helpmatch=; --helpon=; --helppackage=false; --helpshort=false; --helpxml=false; --version=false; --adambeta1=0.90000000000000002; --adambeta2=0.999; --am=/root/w2l/pre-trained_acoustic_models/am_tds_ctc_librispeech_dev_clean.bin; --am_decoder_tr_dropout=0; --am_decoder_tr_layerdrop=0; --am_decoder_tr_layers=1; --arch=tds_do0.15_l5.6.10_mid3.0_ctc.arch2; --archdir=/private/home/qiantong/push_numbers/200M/do0.15_l5.6.10_mid3.0_incDO; --attention=content; --attentionthreshold=0; --attnWindow=no; --attnconvchannel=0; --attnconvkernel=0; --attndim=0; --batchsize=4; --beamsize=100; --beamsizetoken=100; --beamthreshold=20; --blobdata=false; --channels=1; --criterion=ctc; --critoptim=sgd; --datadir=; --dataorder=input; --decoderattnround=1; --decoderdropout=0; --decoderrnnlayer=1; --decodertype=wrd; --devwin=0; --emission_dir=/root/w2l/pre-trained_acoustic_models; --emission_queue_size=3000; --enable_distributed=true; --encoderdim=0; --eosscore=0; --eostoken=false; --everstoredb=false; --fftcachesize=1; --filterbanks=80; --flagsfile=wav2letter/build/test_decode.cfg; --framesizems=30; --framestridems=10; --gamma=0.5; --gumbeltemperature=1; --input=flac; --inputbinsize=100; --inputfeeding=false; --isbeamdump=false; --iter=1500; --itersave=false; --labelsmooth=0; --leftWindowSize=50; --lexicon=/root/w2l/decoder/decoder-unigram-10000-nbest10.lexicon; --linlr=-1; --linlrcrit=-1; --linseg=0; --lm=/root/w2l/decoder/lm_librispeech_convlm_word_14B.bin; --lm_memory=5000; --lm_vocab=/root/w2l/decoder/lm_librispeech_convlm_word_14B.vocab; --lmtype=convlm; --lmweight=1; --localnrmlleftctx=0; --localnrmlrightctx=0; --logadd=false; --lr=0.29999999999999999; --lr_decay=9223372036854775807; --lr_decay_step=9223372036854775807; --lrcosine=false; --lrcrit=0; --max_devices_per_node=8; --maxdecoderoutputlen=200; --maxgradnorm=1; --maxisz=9223372036854775807; --maxload=-1; --maxrate=10; --maxsil=50; --maxtsz=9223372036854775807; --maxword=-1; --melfloor=1; --memstepsize=8338608; --mfcc=false; --mfcccoeffs=13; --mfsc=true; --minisz=0; --minrate=3; --minsil=0; --mintsz=0; --momentum=0.5; --netoptim=sgd; --noresample=false; --nthread=10; --nthread_decoder=1; --nthread_decoder_am_forward=1; --numattnhead=8; --onorm=target; --optimepsilon=1e-08; --optimrho=0.90000000000000002; --outputbinsize=5; --pctteacherforcing=100; --pcttraineval=100; --pow=false; --pretrainWindow=0; --replabel=0; --reportiters=0; --rightWindowSize=50; --rndv_filepath=/checkpoint/qiantong/ls_200M/do0.15_l5.6.10_mid3.0_incDO/100_rndv; --rundir=/checkpoint/qiantong/ls_200M/do0.15_l5.6.10_mid3.0_incDO; --runname=100; --samplerate=16000; --sampletarget=0; --samplingstrategy=rand; --saug_fmaskf=27; --saug_fmaskn=2; --saug_start_update=-1; --saug_tmaskn=2; --saug_tmaskp=1; --saug_tmaskt=100; --sclite=; --seed=2; --show=true; --showletters=false; --silscore=0; --smearing=max; --smoothingtemperature=1; --softwoffset=10; --softwrate=5; --softwstd=5; --sqnorm=true; --stepsize=200; --surround=; --tag=; --target=ltr; --test=/root/w2l/lists/new-test-clean.lst; --tokens=librispeech-train-all-unigram-10000.tokens; --tokensdir=/root/w2l/am; --train=/checkpoint/antares/datasets/librispeech/lists/train-clean-100.lst,/checkpoint/antares/datasets/librispeech/lists/train-clean-360.lst,/checkpoint/antares/datasets/librispeech/lists/train-other-500.lst; --trainWithWindow=false; --transdiag=0; --unkscore=0; --use_memcache=false; --uselexicon=true; --usewordpiece=true; --valid=librispeech/dev-other:/checkpoint/antares/datasets/librispeech/lists/dev-other.lst,librispeech/dev-clean:/checkpoint/antares/datasets/librispeech/lists/dev-clean.lst; --validbatchsize=-1; --warmup=1; --weightdecay=0; --wordscore=0; --wordseparator=_; --world_rank=0; --world_size=64; --alsologtoemail=; --alsologtostderr=false; --colorlogtostderr=false; --drop_log_memory=true; --log_backtrace_at=; --log_dir=; --log_link=; --log_prefix=true; --logbuflevel=0; --logbufsecs=30; --logemaillevel=999; --logfile_mode=436; --logmailer=/bin/mail; --logtostderr=true; --max_log_size=1800; --minloglevel=0; --stderrthreshold=2; --stop_logging_if_full_disk=false; --symbolize_stacktrace=true; --v=0; --vmodule=; 
I0823 22:16:35.113325  1083 Decode.cpp:127] Number of classes (network): 9998
I0823 22:16:36.147918  1083 Decode.cpp:134] Number of words: 200001
I0823 22:16:36.147938  1083 Decode.cpp:136] Here begin
I0823 22:16:36.228821  1083 Decode.cpp:221] Here
I0823 22:16:36.228848  1083 Decode.cpp:230] Init device
I0823 22:16:36.228873  1083 Decode.cpp:232] [ConvLM]: Loading LM from /root/w2l/decoder/lm_librispeech_convlm_word_14B.bin
[ConvLM]: Loading vocabulary from /root/w2l/decoder/lm_librispeech_convlm_word_14B.vocab
[ConvLM]: vocabulary size of convLM 221452
I0823 22:16:39.344952  1083 Decode.cpp:248] [Decoder] LM constructed.
terminate called after throwing an instance of 'af::exception'
  what():  ArrayFire Exception (Device out of memory:101):
In function virtual void* cuda::Allocator::nativeAlloc(size_t)
In file src/backend/cuda/memory.cpp:152
CUDA Error (2): out of memory

 0# 0x00007F47B6291B64 in /usr/local/lib/libafcuda.so.3
 1# 0x00007F47B6CEB467 in /usr/local/lib/libafcuda.so.3
 2# 0x00007F47B6292D06 in /usr/local/lib/libafcuda.so.3
 3# 0x00007F47B5D06FD5 in /usr/local/lib/libafcuda.so.3
 4# 0x00007F47B5D079A8 in /usr/local/lib/libafcuda.so.3
 5# af_get_device_ptr in /usr/local/lib/libafcuda.so.3
 6# void* af::array::device<void>() const in /usr/local/lib/libafcuda.so.3
 7# 0x00005636348F4F3C in wav2letter/build/Decoder
 8# 0x000056363496FEED in wav2letter/build/Decoder
 9# 0x000056363495B82F in wav2letter/build/Decoder
10# 0x000056363492C4FE in wav2letter/build/Decoder
11# 0x0000563634941205 in wav2letter/build/Decoder
12# 0x0000563634962F21 in wav2letter/build/Decoder
13# 0x000056363496307D in wav2letter/build/Decoder
14# 0x0000563634913C5A in wav2letter/build/Decoder
15# 0x000056363483E293
*** Aborted at 1598221003 (unix time) try "date -d @1598221003" if you are using GNU date ***
PC: @     0x7f4796035e97 gsignal
*** SIGABRT (@0x43b) received by PID 1083 (TID 0x7f47db905380) from PID 1083; stack trace: ***
    @     0x7f47d3c1b890 (unknown)
    @     0x7f4796035e97 gsignal
    @     0x7f4796037801 abort
    @     0x7f4796a2a957 (unknown)
    @     0x7f4796a30ab6 (unknown)
    @     0x7f4796a30af1 std::terminate()
    @     0x7f4796a30d24 __cxa_throw
    @     0x7f47b6c85728 af::array::device<>()
    @     0x5636348f4f3c fl::DevicePtr::DevicePtr()
    @     0x56363496feed fl::conv2d()
    @     0x56363495b82f fl::AsymmetricConv1D::forward()
    @     0x56363492c4fe fl::UnaryModule::forward()
    @     0x563634941205 fl::WeightNorm::forward()
    @     0x563634962f21 fl::Residual::forward()
    @     0x56363496307d fl::Residual::forward()
    @     0x563634913c5a fl::Sequential::forward()
    @     0x56363483e293 _ZZN3w2l27buildGetConvLmScoreFunctionESt10shared_ptrIN2fl6ModuleEEENKUlRKSt6vectorIiSaIiEES8_iiE_clES8_S8_ii
    @     0x56363483ef16 _ZNSt17_Function_handlerIFSt6vectorIfSaIfEERKS0_IiSaIiEES6_iiEZN3w2l27buildGetConvLmScoreFunctionESt10shared_ptrIN2fl6ModuleEEEUlS6_S6_iiE_E9_M_invokeERKSt9_Any_dataS6_S6_OiSI_
    @     0x5636347a00bd w2l::ConvLM::scoreWithLmIdx()
    @     0x5636347a0794 w2l::ConvLM::score()
    @     0x563634646346 main
    @     0x7f4796018b97 __libc_start_main
    @     0x5636346a900a _start
Aborted (core dumped)

Anyway to take care of the "out of memory" error? Thank you!!

tlikhomanenko commented 4 years ago

It is working now with saved emissions, cool! Now you have OOM with convlm forward, to fix it try to reduce --lm_memory=5000, try --lm_memory=2000 for example, if still OOM try to reduce more.

ML6634 commented 4 years ago

Thank @tlikhomanenko for the quick response! For reducing lm_memory, I have tried quite a few values: 2000, 1000, ..., and finally 1. Unfortunately, each time the OOM error is still there. The following is for --lm_memory=1:

root@5861d058cec8:~# wav2letter/build/Decoder   --flagsfile wav2letter/build/test_decode.cfg   --minloglevel 0   --logtostderr 1   --lmweight 1   --wordscore 0   --eosscore 0   --silscore 0   --unkscore 0   --smearing max
I0823 23:05:53.120163  1114 Decode.cpp:58] Reading flags from file wav2letter/build/test_decode.cfg
I0823 23:05:53.120270  1114 Decode.cpp:75] [Network] Reading acoustic model from /root/w2l/pre-trained_acoustic_models/am_tds_ctc_librispeech_dev_clean.bin
I0823 23:05:53.764691  1114 Decode.cpp:79] [Network] Sequential [input -> (0) -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> (12) -> (13) -> (14) -> (15) -> (16) -> (17) -> (18) -> (19) -> (20) -> (21) -> (22) -> (23) -> (24) -> (25) -> (26) -> (27) -> (28) -> (29) -> (30) -> (31) -> (32) -> (33) -> (34) -> (35) -> (36) -> (37) -> output]
    (0): SpecAugment ( W: 80, F: 27, mF: 2, T: 100, p: 1, mT: 2 )
    (1): View (-1 80 1 0)
    (2): Conv2D (1->10, 21x1, 2,1, SAME,SAME, 1, 1) (with bias)
    (3): ReLU
    (4): Dropout (0.000000)
    (5): LayerNorm ( axis : { 0 1 2 } , size : -1)
    (6): Time-Depth Separable Block (21, 240, 10) [800 -> 2400 -> 800]
    (7): Time-Depth Separable Block (21, 240, 10) [800 -> 2400 -> 800]
    (8): Time-Depth Separable Block (21, 240, 10) [800 -> 2400 -> 800]
    (9): Time-Depth Separable Block (21, 240, 10) [800 -> 2400 -> 800]
    (10): Time-Depth Separable Block (21, 240, 10) [800 -> 2400 -> 800]
    (11): Conv2D (10->14, 21x1, 2,1, SAME,SAME, 1, 1) (with bias)
    (12): ReLU
    (13): Dropout (0.000000)
    (14): LayerNorm ( axis : { 0 1 2 } , size : -1)
    (15): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120]
    (16): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120]
    (17): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120]
    (18): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120]
    (19): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120]
    (20): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120]
    (21): Conv2D (14->18, 21x1, 2,1, SAME,SAME, 1, 1) (with bias)
    (22): ReLU
    (23): Dropout (0.000000)
    (24): LayerNorm ( axis : { 0 1 2 } , size : -1)
    (25): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440]
    (26): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440]
    (27): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440]
    (28): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440]
    (29): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440]
    (30): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440]
    (31): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440]
    (32): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440]
    (33): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440]
    (34): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440]
    (35): View (0 1440 1 0)
    (36): Reorder (1,0,3,2)
    (37): Linear (1440->9998) (with bias)
I0823 23:05:53.764780  1114 Decode.cpp:82] [Criterion] ConnectionistTemporalClassificationCriterion
I0823 23:05:53.764782  1114 Decode.cpp:84] [Network] Number of params: 203394122
I0823 23:05:53.764794  1114 Decode.cpp:90] [Network] Updating flags from config file: /root/w2l/pre-trained_acoustic_models/am_tds_ctc_librispeech_dev_clean.bin
I0823 23:05:53.765208  1114 Decode.cpp:106] Gflags after parsing 
--flagfile=; --fromenv=; --tryfromenv=; --undefok=; --tab_completion_columns=80; --tab_completion_word=; --help=false; --helpfull=false; --helpmatch=; --helpon=; --helppackage=false; --helpshort=false; --helpxml=false; --version=false; --adambeta1=0.90000000000000002; --adambeta2=0.999; --am=/root/w2l/pre-trained_acoustic_models/am_tds_ctc_librispeech_dev_clean.bin; --am_decoder_tr_dropout=0; --am_decoder_tr_layerdrop=0; --am_decoder_tr_layers=1; --arch=tds_do0.15_l5.6.10_mid3.0_ctc.arch2; --archdir=/private/home/qiantong/push_numbers/200M/do0.15_l5.6.10_mid3.0_incDO; --attention=content; --attentionthreshold=0; --attnWindow=no; --attnconvchannel=0; --attnconvkernel=0; --attndim=0; --batchsize=4; --beamsize=100; --beamsizetoken=100; --beamthreshold=20; --blobdata=false; --channels=1; --criterion=ctc; --critoptim=sgd; --datadir=; --dataorder=input; --decoderattnround=1; --decoderdropout=0; --decoderrnnlayer=1; --decodertype=wrd; --devwin=0; --emission_dir=/root/w2l/pre-trained_acoustic_models; --emission_queue_size=3000; --enable_distributed=true; --encoderdim=0; --eosscore=0; --eostoken=false; --everstoredb=false; --fftcachesize=1; --filterbanks=80; --flagsfile=wav2letter/build/test_decode.cfg; --framesizems=30; --framestridems=10; --gamma=0.5; --gumbeltemperature=1; --input=flac; --inputbinsize=100; --inputfeeding=false; --isbeamdump=false; --iter=1500; --itersave=false; --labelsmooth=0; --leftWindowSize=50; --lexicon=/root/w2l/decoder/decoder-unigram-10000-nbest10.lexicon; --linlr=-1; --linlrcrit=-1; --linseg=0; --lm=/root/w2l/decoder/lm_librispeech_convlm_word_14B.bin; --lm_memory=1; --lm_vocab=/root/w2l/decoder/lm_librispeech_convlm_word_14B.vocab; --lmtype=convlm; --lmweight=1; --localnrmlleftctx=0; --localnrmlrightctx=0; --logadd=false; --lr=0.29999999999999999; --lr_decay=9223372036854775807; --lr_decay_step=9223372036854775807; --lrcosine=false; --lrcrit=0; --max_devices_per_node=8; --maxdecoderoutputlen=200; --maxgradnorm=1; --maxisz=9223372036854775807; --maxload=-1; --maxrate=10; --maxsil=50; --maxtsz=9223372036854775807; --maxword=-1; --melfloor=1; --memstepsize=8338608; --mfcc=false; --mfcccoeffs=13; --mfsc=true; --minisz=0; --minrate=3; --minsil=0; --mintsz=0; --momentum=0.5; --netoptim=sgd; --noresample=false; --nthread=10; --nthread_decoder=1; --nthread_decoder_am_forward=1; --numattnhead=8; --onorm=target; --optimepsilon=1e-08; --optimrho=0.90000000000000002; --outputbinsize=5; --pctteacherforcing=100; --pcttraineval=100; --pow=false; --pretrainWindow=0; --replabel=0; --reportiters=0; --rightWindowSize=50; --rndv_filepath=/checkpoint/qiantong/ls_200M/do0.15_l5.6.10_mid3.0_incDO/100_rndv; --rundir=/checkpoint/qiantong/ls_200M/do0.15_l5.6.10_mid3.0_incDO; --runname=100; --samplerate=16000; --sampletarget=0; --samplingstrategy=rand; --saug_fmaskf=27; --saug_fmaskn=2; --saug_start_update=-1; --saug_tmaskn=2; --saug_tmaskp=1; --saug_tmaskt=100; --sclite=; --seed=2; --show=true; --showletters=false; --silscore=0; --smearing=max; --smoothingtemperature=1; --softwoffset=10; --softwrate=5; --softwstd=5; --sqnorm=true; --stepsize=200; --surround=; --tag=; --target=ltr; --test=/root/w2l/lists/new-test-clean.lst; --tokens=librispeech-train-all-unigram-10000.tokens; --tokensdir=/root/w2l/am; --train=/checkpoint/antares/datasets/librispeech/lists/train-clean-100.lst,/checkpoint/antares/datasets/librispeech/lists/train-clean-360.lst,/checkpoint/antares/datasets/librispeech/lists/train-other-500.lst; --trainWithWindow=false; --transdiag=0; --unkscore=0; --use_memcache=false; --uselexicon=true; --usewordpiece=true; --valid=librispeech/dev-other:/checkpoint/antares/datasets/librispeech/lists/dev-other.lst,librispeech/dev-clean:/checkpoint/antares/datasets/librispeech/lists/dev-clean.lst; --validbatchsize=-1; --warmup=1; --weightdecay=0; --wordscore=0; --wordseparator=_; --world_rank=0; --world_size=64; --alsologtoemail=; --alsologtostderr=false; --colorlogtostderr=false; --drop_log_memory=true; --log_backtrace_at=; --log_dir=; --log_link=; --log_prefix=true; --logbuflevel=0; --logbufsecs=30; --logemaillevel=999; --logfile_mode=436; --logmailer=/bin/mail; --logtostderr=true; --max_log_size=1800; --minloglevel=0; --stderrthreshold=2; --stop_logging_if_full_disk=false; --symbolize_stacktrace=true; --v=0; --vmodule=; 
I0823 23:05:53.768092  1114 Decode.cpp:127] Number of classes (network): 9998
I0823 23:05:54.727926  1114 Decode.cpp:134] Number of words: 200001
I0823 23:05:54.727947  1114 Decode.cpp:136] Here begin
I0823 23:05:54.806488  1114 Decode.cpp:221] Here
I0823 23:05:54.806516  1114 Decode.cpp:230] Init device
I0823 23:05:54.806540  1114 Decode.cpp:232] [ConvLM]: Loading LM from /root/w2l/decoder/lm_librispeech_convlm_word_14B.bin
[ConvLM]: Loading vocabulary from /root/w2l/decoder/lm_librispeech_convlm_word_14B.vocab
[ConvLM]: vocabulary size of convLM 221452
I0823 23:05:57.741475  1114 Decode.cpp:248] [Decoder] LM constructed.
terminate called after throwing an instance of 'af::exception'
  what():  ArrayFire Exception (Device out of memory:101):
In function virtual void* cuda::Allocator::nativeAlloc(size_t)
In file src/backend/cuda/memory.cpp:152
CUDA Error (2): out of memory

 0# 0x00007FBC96942B64 in /usr/local/lib/libafcuda.so.3
 1# 0x00007FBC9739C467 in /usr/local/lib/libafcuda.so.3
 2# 0x00007FBC96943D06 in /usr/local/lib/libafcuda.so.3
 3# 0x00007FBC963B7FD5 in /usr/local/lib/libafcuda.so.3
 4# 0x00007FBC963B89A8 in /usr/local/lib/libafcuda.so.3
 5# af_get_device_ptr in /usr/local/lib/libafcuda.so.3
 6# void* af::array::device<void>() const in /usr/local/lib/libafcuda.so.3
 7# 0x000055AB403E3F3C in wav2letter/build/Decoder
 8# 0x000055AB4045EEED in wav2letter/build/Decoder
 9# 0x000055AB4044A82F in wav2letter/build/Decoder
10# 0x000055AB4041B4FE in wav2letter/build/Decoder
11# 0x000055AB40430205 in wav2letter/build/Decoder
12# 0x000055AB40451F21 in wav2letter/build/Decoder
13# 0x000055AB4045207D in wav2letter/build/Decoder
14# 0x000055AB40402C5A in wav2letter/build/Decoder
15# 0x000055AB4032D293
*** Aborted at 1598223962 (unix time) try "date -d @1598223962" if you are using GNU date ***
PC: @     0x7fbc766e6e97 gsignal
*** SIGABRT (@0x45a) received by PID 1114 (TID 0x7fbcbbfb6380) from PID 1114; stack trace: ***
    @     0x7fbcb42cc890 (unknown)
    @     0x7fbc766e6e97 gsignal
    @     0x7fbc766e8801 abort
    @     0x7fbc770db957 (unknown)
    @     0x7fbc770e1ab6 (unknown)
    @     0x7fbc770e1af1 std::terminate()
    @     0x7fbc770e1d24 __cxa_throw
    @     0x7fbc97336728 af::array::device<>()
    @     0x55ab403e3f3c fl::DevicePtr::DevicePtr()
    @     0x55ab4045eeed fl::conv2d()
    @     0x55ab4044a82f fl::AsymmetricConv1D::forward()
    @     0x55ab4041b4fe fl::UnaryModule::forward()
    @     0x55ab40430205 fl::WeightNorm::forward()
    @     0x55ab40451f21 fl::Residual::forward()
    @     0x55ab4045207d fl::Residual::forward()
    @     0x55ab40402c5a fl::Sequential::forward()
    @     0x55ab4032d293 _ZZN3w2l27buildGetConvLmScoreFunctionESt10shared_ptrIN2fl6ModuleEEENKUlRKSt6vectorIiSaIiEES8_iiE_clES8_S8_ii
    @     0x55ab4032df16 _ZNSt17_Function_handlerIFSt6vectorIfSaIfEERKS0_IiSaIiEES6_iiEZN3w2l27buildGetConvLmScoreFunctionESt10shared_ptrIN2fl6ModuleEEEUlS6_S6_iiE_E9_M_invokeERKSt9_Any_dataS6_S6_OiSI_
    @     0x55ab4028f0bd w2l::ConvLM::scoreWithLmIdx()
    @     0x55ab4028f794 w2l::ConvLM::score()
    @     0x55ab40135346 main
    @     0x7fbc766c9b97 __libc_start_main
    @     0x55ab4019800a _start
Aborted (core dumped)

Any other way to avoid the OOM? Or, anybody has ever run though this on a computer with 1 GPU (4 GB)? Thank you again!