Closed ML6634 closed 3 years ago
@ML6634
could you give more details what are you running? Do you run Decode or Train binaries?
Thank @tlikhomanenko for asking. I got it when I ran Decode (see the bottom). If you need more information, please feel free to let me know. Thank you!
root@94fa6edf8e14:~# wav2letter/build/Decoder --flagsfile wav2letter/recipes/models/sota/2019/librispeech/decode_tds_ctc_gcnn_clean.cfg --minloglevel=0 --logtostderr=1 I0620 03:38:50.810289 90 Decode.cpp:58] Reading flags from file wav2letter/recipes/models/sota/2019/librispeech/decode_tds_ctc_gcnn_clean.cfg I0620 03:38:50.810439 90 Decode.cpp:75] [Network] Reading acoustic model from /root/w2l/saved_models/pre-trained_acoustic_models/am_tds_ctc_librispeech_dev_clean.bin I0620 03:38:51.475615 90 Decode.cpp:79] [Network] Sequential [input -> (0) -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> (12) -> (13) -> (14) -> (15) -> (16) -> (17) -> (18) -> (19) -> (20) -> (21) -> (22) -> (23) -> (24) -> (25) -> (26) -> (27) -> (28) -> (29) -> (30) -> (31) -> (32) -> (33) -> (34) -> (35) -> (36) -> (37) -> output] (0): SpecAugment ( W: 80, F: 27, mF: 2, T: 100, p: 1, mT: 2 ) (1): View (-1 80 1 0) (2): Conv2D (1->10, 21x1, 2,1, SAME,SAME, 1, 1) (with bias) (3): ReLU (4): Dropout (0.000000) (5): LayerNorm ( axis : { 0 1 2 } , size : -1) (6): Time-Depth Separable Block (21, 240, 10) [800 -> 2400 -> 800] (7): Time-Depth Separable Block (21, 240, 10) [800 -> 2400 -> 800] (8): Time-Depth Separable Block (21, 240, 10) [800 -> 2400 -> 800] (9): Time-Depth Separable Block (21, 240, 10) [800 -> 2400 -> 800] (10): Time-Depth Separable Block (21, 240, 10) [800 -> 2400 -> 800] (11): Conv2D (10->14, 21x1, 2,1, SAME,SAME, 1, 1) (with bias) (12): ReLU (13): Dropout (0.000000) (14): LayerNorm ( axis : { 0 1 2 } , size : -1) (15): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120] (16): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120] (17): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120] (18): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120] (19): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120] (20): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120] (21): Conv2D (14->18, 21x1, 2,1, SAME,SAME, 1, 1) (with bias) (22): ReLU (23): Dropout (0.000000) (24): LayerNorm ( axis : { 0 1 2 } , size : -1) (25): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (26): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (27): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (28): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (29): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (30): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (31): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (32): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (33): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (34): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (35): View (0 1440 1 0) (36): Reorder (1,0,3,2) (37): Linear (1440->9998) (with bias) I0620 03:38:51.475695 90 Decode.cpp:82] [Criterion] ConnectionistTemporalClassificationCriterion I0620 03:38:51.475713 90 Decode.cpp:84] [Network] Number of params: 203394122 I0620 03:38:51.475725 90 Decode.cpp:90] [Network] Updating flags from config file: /root/w2l/saved_models/pre-trained_acoustic_models/am_tds_ctc_librispeech_dev_clean.bin I0620 03:38:51.476186 90 Decode.cpp:106] Gflags after parsing --flagfile=; --fromenv=; --tryfromenv=; --undefok=; --tab_completion_columns=80; --tab_completion_word=; --help=false; --helpfull=false; --helpmatch=; --helpon=; --helppackage=false; --helpshort=false; --helpxml=false; --version=false; --adambeta1=0.90000000000000002; --adambeta2=0.999; --am=/root/w2l/saved_models/pre-trained_acoustic_models/am_tds_ctc_librispeech_dev_clean.bin; --am_decoder_tr_dropout=0; --am_decoder_tr_layerdrop=0; --am_decoder_tr_layers=1; --arch=tds_do0.15_l5.6.10_mid3.0_ctc.arch2; --archdir=/private/home/qiantong/push_numbers/200M/do0.15_l5.6.10_mid3.0_incDO; --attention=content; --attentionthreshold=0; --attnWindow=no; --attnconvchannel=0; --attnconvkernel=0; --attndim=0; --batchsize=4; --beamsize=250; --beamsizetoken=100; --beamthreshold=20; --blobdata=false; --channels=1; --criterion=ctc; --critoptim=sgd; --datadir=/root/w2l/lists; --dataorder=input; --decoderattnround=1; --decoderdropout=0; --decoderrnnlayer=1; --decodertype=wrd; --devwin=0; --emission_dir=; --emission_queue_size=3000; --enable_distributed=true; --encoderdim=0; --eosscore=0; --eostoken=false; --everstoredb=false; --fftcachesize=1; --filterbanks=80; --flagsfile=wav2letter/recipes/models/sota/2019/librispeech/decode_tds_ctc_gcnn_clean.cfg; --framesizems=30; --framestridems=10; --gamma=0.5; --gumbeltemperature=1; --input=flac; --inputbinsize=100; --inputfeeding=false; --isbeamdump=false; --iter=1500; --itersave=false; --labelsmooth=0; --leftWindowSize=50; --lexicon=/root/w2l/decoder/decoder-unigram-10000-nbest10.lexicon; --linlr=-1; --linlrcrit=-1; --linseg=0; --lm=/root/w2l/decoder/lm_librispeech_convlm_word_14B.bin; --lm_memory=5000; --lm_vocab=/root/w2l/decoder/lm_librispeech_convlm_word_14B.vocab; --lmtype=convlm; --lmweight=0.77112819889331996; --localnrmlleftctx=0; --localnrmlrightctx=0; --logadd=false; --lr=0.29999999999999999; --lr_decay=9223372036854775807; --lr_decay_step=9223372036854775807; --lrcosine=false; --lrcrit=0; --max_devices_per_node=8; --maxdecoderoutputlen=200; --maxgradnorm=1; --maxisz=9223372036854775807; --maxload=-1; --maxrate=10; --maxsil=50; --maxtsz=9223372036854775807; --maxword=-1; --melfloor=1; --memstepsize=8338608; --mfcc=false; --mfcccoeffs=13; --mfsc=true; --minisz=0; --minrate=3; --minsil=0; --mintsz=0; --momentum=0.5; --netoptim=sgd; --noresample=false; --nthread=10; --nthread_decoder=8; --nthread_decoder_am_forward=1; --numattnhead=8; --onorm=target; --optimepsilon=1e-08; --optimrho=0.90000000000000002; --outputbinsize=5; --pctteacherforcing=100; --pcttraineval=100; --pow=false; --pretrainWindow=0; --replabel=0; --reportiters=0; --rightWindowSize=50; --rndv_filepath=/checkpoint/qiantong/ls_200M/do0.15_l5.6.10_mid3.0_incDO/100_rndv; --rundir=/checkpoint/qiantong/ls_200M/do0.15_l5.6.10_mid3.0_incDO; --runname=100; --samplerate=16000; --sampletarget=0; --samplingstrategy=rand; --saug_fmaskf=27; --saug_fmaskn=2; --saug_start_update=-1; --saug_tmaskn=2; --saug_tmaskp=1; --saug_tmaskt=100; --sclite=/root/w2l/saved_models/am_tds_ctc_librispeech; --seed=2; --show=true; --showletters=true; --silscore=0; --smearing=max; --smoothingtemperature=1; --softwoffset=10; --softwrate=5; --softwstd=5; --sqnorm=true; --stepsize=200; --surround=; --tag=; --target=ltr; --test=test-clean.lst; --tokens=librispeech-train-all-unigram-10000.tokens; --tokensdir=/root/w2l/am; --train=/checkpoint/antares/datasets/librispeech/lists/train-clean-100.lst,/checkpoint/antares/datasets/librispeech/lists/train-clean-360.lst,/checkpoint/antares/datasets/librispeech/lists/train-other-500.lst; --trainWithWindow=false; --transdiag=0; --unkscore=-inf; --use_memcache=false; --uselexicon=true; --usewordpiece=true; --valid=librispeech/dev-other:/checkpoint/antares/datasets/librispeech/lists/dev-other.lst,librispeech/dev-clean:/checkpoint/antares/datasets/librispeech/lists/dev-clean.lst; --warmup=8000; --weightdecay=0; --wordscore=0.35770072381611001; --wordseparator=_; --world_rank=0; --world_size=64; --alsologtoemail=; --alsologtostderr=false; --colorlogtostderr=false; --drop_log_memory=true; --log_backtrace_at=; --log_dir=; --log_link=; --log_prefix=true; --logbuflevel=0; --logbufsecs=30; --logemaillevel=999; --logfile_mode=436; --logmailer=/bin/mail; --logtostderr=true; --max_log_size=1800; --minloglevel=0; --stderrthreshold=2; --stop_logging_if_full_disk=false; --symbolize_stacktrace=true; --v=0; --vmodule=; I0620 03:38:51.479128 90 Decode.cpp:127] Number of classes (network): 9998 I0620 03:38:52.504566 90 Decode.cpp:134] Number of words: 200001 I0620 03:38:52.588066 90 Decode.cpp:231] [ConvLM]: Loading LM from /root/w2l/decoder/lm_librispeech_convlm_word_14B.bin [ConvLM]: Loading vocabulary from /root/w2l/decoder/lm_librispeech_convlm_word_14B.vocab [ConvLM]: vocabulary size of convLM 221452 I0620 03:38:55.685602 90 Decode.cpp:247] [Decoder] LM constructed. Unable to allocate memory with native alloc for size 41943040 bytes with error 'ArrayFire Exception (Device out of memory:101): ArrayFire error: In function fl::MemoryManagerInstaller::MemoryManagerInstaller(std::shared_ptr<fl::MemoryManagerAdapter>)::<lambda(size_t)> In file /root/flashlight/flashlight/memory/MemoryManagerInstaller.cpp:178'terminate called after throwing an instance of 'af::exception' what(): ArrayFire Exception (Unknown error:999): In function T* af::array::device() const [with T = void] In file src/api/cpp/array.cpp:1024 *** Aborted at 1592624340 (unix time) try "date -d @1592624340" if you are using GNU date *** PC: @ 0x7f077ba1ae97 gsignal *** SIGABRT (@0x5a) received by PID 90 (TID 0x7f07c12ea380) from PID 90; stack trace: *** @ 0x7f07b9600890 (unknown) @ 0x7f077ba1ae97 gsignal @ 0x7f077ba1c801 abort @ 0x7f077c40f957 (unknown) @ 0x7f077c415ab6 (unknown) @ 0x7f077c415af1 std::terminate() @ 0x7f077c415d24 __cxa_throw @ 0x7f079c66a728 af::array::device<>() @ 0x55793ed1bc4c fl::DevicePtr::DevicePtr() @ 0x55793ed9823d fl::conv2d() @ 0x55793ed83c0f fl::AsymmetricConv1D::forward() @ 0x55793ed57cfe fl::UnaryModule::forward() @ 0x55793ed699b5 fl::WeightNorm::forward() @ 0x55793ed8b2a1 fl::Residual::forward() @ 0x55793ed8b3fd fl::Residual::forward() @ 0x55793ed4369a fl::Sequential::forward() @ 0x55793ec62a4a _ZZN3w2l27buildGetConvLmScoreFunctionESt10shared_ptrIN2fl6ModuleEEENKUlRKSt6vectorIiSaIiEES8_iiE_clES8_S8_ii @ 0x55793ec63866 _ZNSt17_Function_handlerIFSt6vectorIS0_IfSaIfEESaIS2_EERKS0_IiSaIiEES8_iiEZN3w2l27buildGetConvLmScoreFunctionESt10shared_ptrIN2fl6ModuleEEEUlS8_S8_iiE_E9_M_invokeERKSt9_Any_dataS8_S8_OiSK_ @ 0x55793ebc6ba0 w2l::ConvLM::scoreWithLmIdx() @ 0x55793ebc7264 w2l::ConvLM::score() @ 0x55793ea7a9f6 main @ 0x7f077b9fdb97 __libc_start_main @ 0x55793ead6d2a _start Aborted (core dumped)
What is GPU memory on your machine? Try to set lm_memory=500
The GPU memory on my machine is 4 GB:
~$ glxinfo | egrep -i 'device|memory' GLX_NV_multigpu_context, GLX_NV_robustness_video_memory_purge, GLX_NV_robustness_video_memory_purge, GLX_NV_swap_group, GLX_NV_multigpu_context, GLX_NV_robustness_video_memory_purge, Memory info (GL_NVX_gpu_memory_info): Dedicated video memory: 4096 MB Total available memory: 4096 MB ...
Also see: https://github.com/facebookresearch/wav2letter/issues/683#issue-633790334.
I tried setting lm_memory=500, ... and even down to 1, it produced the same error when I ran it. Any other ways? Thank you!
Well, the problem that both am and lm are loaded on the GPU, we do removing of AM from GPU as soon as predictions are ready for CTC models, but loading of LM happens before loading AM (for example to make sure your program will not crash after AM forward during LM loading and additional decoder things).
Solution for you to try with GPU memory restriction you have:
emission_dir
emission_dir
- in this case AM forward will not be running and predictions will be read from the saved on disk files in the emission_dir
Docs on Test and on emission dir check on decoder wiki page https://github.com/facebookresearch/wav2letter/wiki/Beam-Search-Decoder.
I ran
root@94fa6edf8e14:~# wav2letter/build/Test \ > --am /root/w2l/saved_models/pre-trained_acoustic_models/am_tds_ctc_librispeech_dev_clean.bin\ > --maxload 10 \ > --emission_dir /root/w2l/saved_models/pre-trained_acoustic_models \ > --test test-clean.lst
and
root@94fa6edf8e14:~# wav2letter/build/Test \ > --am /root/w2l/saved_models/pre-trained_acoustic_models/am_tds_ctc_librispeech_dev_clean.bin\ > --maxload 10 \ > --test test-clean.lst
Both gave an error like:
terminate called after throwing an instance of 'std::runtime_error' what(): Invalid dictionary filepath specified. *** Aborted at 1592894594 (unix time) try "date -d @1592894594" if you are using GNU date ***
Any ideas? Thank you!
Did you read docs https://github.com/facebookresearch/wav2letter/wiki/Beam-Search-Decoder?
While running the Test binary, the AM is loaded and all the saved flags will be used if you don’t specify them in the command line. For example, tokens and lexicon paths. So, in case you want to overwrite them, you should directly specify them:
wav2letter/build/Test \
--am path/to/train/am.bin \
--maxload 10 \
--test path/to/test/list/file \
--tokensdir path/to/tokens/dir \
--tokens tokens.txt \
--lexicon path/to/the/lexicon/file
When I ran Decoder, I got an error:
root@02ae45d9e4ed:~# wav2letter/build/Decoder \ > --flagsfile decode.cfg \ > --lmweight 1 \ > --wordscore 0 \ > --eosscore 0 \ > --silscore 0 \ > --unkscore 0 \ > --smearing max E0627 04:58:07.188414 271 Serial.h:77] Error while loading "/root/wav2letter/src/decoder/test/emission.bin": basic_string::_M_replace_aux E0627 04:58:08.188877 271 Serial.h:77] Error while loading "/root/wav2letter/src/decoder/test/emission.bin": basic_string::_M_replace_aux E0627 04:58:10.189286 271 Serial.h:77] Error while loading "/root/wav2letter/src/decoder/test/emission.bin": basic_string::_M_replace_aux E0627 04:58:14.189731 271 Serial.h:77] Error while loading "/root/wav2letter/src/decoder/test/emission.bin": basic_string::_M_replace_aux E0627 04:58:22.190088 271 Serial.h:77] Error while loading "/root/wav2letter/src/decoder/test/emission.bin": basic_string::_M_replace_aux E0627 04:58:38.190415 271 Serial.h:77] Error while loading "/root/wav2letter/src/decoder/test/emission.bin": basic_string::_M_replace_aux terminate called after throwing an instance of 'std::length_error' what(): basic_string::_M_replace_aux *** Aborted at 1593233918 (unix time) try "date -d @1593233918" if you are using GNU date *** PC: @ 0x7fc4b61d3e97 gsignal *** SIGABRT (@0x10f) received by PID 271 (TID 0x7fc4fbaa3380) from PID 271; stack trace: *** @ 0x7fc4f3db9890 (unknown) @ 0x7fc4b61d3e97 gsignal @ 0x7fc4b61d5801 abort @ 0x7fc4b6bc8957 (unknown) @ 0x7fc4b6bceab6 (unknown) @ 0x7fc4b6bceaf1 std::terminate() @ 0x7fc4b6bced79 __cxa_rethrow @ 0x55d976031427 _ZN3w2l16retryWithBackoffIRFvRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERSt13unordered_mapIS6_S6_St4hashIS6_ESt8equal_toIS6_ESaISt4pairIS7_S6_EEERSt10shared_ptrIN2fl6ModuleEERSJ_INS_17SequenceCriterionEEEJS8_SI_SN_SQ_EEENSt9result_ofIFT_DpT0_EE4typeENSt6chrono8durationIdSt5ratioILl1ELl1EEEEdlOSU_DpOSV_.constprop.7511 @ 0x55d975fcf717 main @ 0x7fc4b61b6b97 __libc_start_main @ 0x55d97602dd2a _start Aborted (core dumped)
my decode.cfg is:
--am=/root/wav2letter/src/decoder/test/emission.bin --test=/root/w2l/lists/test-clean.lst --maxload=10 --nthread_decoder=2 --show --sholetters --lexicon=/root/w2l/decoder/decoder-unigram-10000-nbest10.lexicon --uselexicon=true --lm=/root/w2l/decoder/lm_librispeech_convlm_word_14B.bin --lmtype=kenlm --decodertype=wrd --beamsize=100 --beamsizetoken=100 --beamthreshold=20
In addition,
root@02ae45d9e4ed:~/wav2letter/src/decoder/test# ls -l total 4344 -rw-r--r-- 1 root root 6715 May 3 03:20 DecoderTest.cpp -rw-r--r-- 1 root root 8 May 3 03:20 TN.bin -rw-r--r-- 1 root root 27260 May 3 03:20 emission.bin -rw-r--r-- 1 root root 56 May 3 03:20 letters.lst -rw-r--r-- 1 root root 3727822 May 3 03:20 lm.arpa -rw-r--r-- 1 root root 3364 May 3 03:20 transition.bin -rw-r--r-- 1 root root 663876 May 3 03:20 words.lst
May I ask how to take care of the above bug? Thank you!,
First ~/wav2letter/src/decoder/test
contains data for decoder test, you cannot use them in the way you are doing. Emissions.bin stores emissions data, not an acoustic model.
What you need to do: 1) run Test binary for 10 samples with am you downloaded, like:
wav2letter/build/Test \
--am path/to/am.bin \
--maxload 10 \
--test path/to/test/list/file \
--tokensdir path/to/tokens/dir \
--tokens tokens.txt \
--lexicon path/to/the/lexicon/file
--emission_dir path/where/to/store/emissions
2) Run decoder with specifying additional parameter in the config --emission_dir=path/where/to/store/emissions
using above path.
For the 1) you can use values of parameters from the decoder config you used above wav2letter/recipes/models/sota/2019/librispeech/decode_tds_ctc_gcnn_clean.cfg
Are the steps clearer now?
Thank @tlikhomanenko for the help!
The Test:
root@a9a8aeab9c4c:~# wav2letter/build/Test \ > --am /root/w2l/saved_models/pre-trained_acoustic_models/am_tds_ctc_librispeech_dev_clean.bin \ > --maxload 10 \ > --test /root/w2l/lists/test-clean.lst \ > --tokensdir /root/w2l/am \ > --tokens librispeech-train-all-unigram-10000.tokens \ > --lexicon /root/w2l/am/librispeech-train+dev-unigram-10000-nbest10.lexicon \ > --emission_dir /root/w2l/saved_models/pre-trained_acoustic_models
went through.
However,
root@a9a8aeab9c4c:~# wav2letter/build/Decoder \ > --flagsfile test_decode.cfg \ > --lmweight 1 \ > --wordscore 0 \ > --eosscore 0 \ > --silscore 0 \ > --unkscore 0 \ > --smearing max terminate called after throwing an instance of 'std::runtime_error' what(): Invalid dictionary filepath specified. *** Aborted at 1593382447 (unix time) try "date -d @1593382447" if you are using GNU date *** PC: @ 0x7f1fe3ea8e97 gsignal *** SIGABRT (@0xf8) received by PID 248 (TID 0x7f2029778380) from PID 248; stack trace: *** @ 0x7f2021a8e890 (unknown) @ 0x7f1fe3ea8e97 gsignal @ 0x7f1fe3eaa801 abort @ 0x7f1fe489d957 (unknown) @ 0x7f1fe48a3ab6 (unknown) @ 0x7f1fe48a3af1 std::terminate() @ 0x7f1fe48a3d24 __cxa_throw @ 0x555bff8278bd main @ 0x7f1fe3e8bb97 __libc_start_main @ 0x555bff882d2a _start Aborted (core dumped)
test_decode.cfg is:
--emission_dir=/root/w2l/saved_models/pre-trained_acoustic_models --test=/root/w2l/lists/test-clean.lst --maxload=10 --nthread_decoder=2 --show --sholetters --lexicon=/root/w2l/decoder/decoder-unigram-10000-nbest10.lexicon --uselexicon=true --lm=/root/w2l/decoder/lm_librispeech_convlm_word_14B.bin --lmtype=kenlm --decodertype=wrd --beamsize=100 --beamsizetoken=100 --beamthreshold=20
Any comments about the bug? Thank you!
still add to the decoder config
--tokensdir /root/w2l/am
--tokens librispeech-train-all-unigram-10000.tokens
Thank @tlikhomanenko for pointing it out!
I added:
--tokensdir=/root/w2l/am --tokens=librispeech-train-all-unigram-10000.tokens
into the config. Then, when I ran Decoder, I got an error associate with memory accessing:
root@a9a8aeab9c4c:~# wav2letter/build/Decoder --flagsfile test_decode.cfg --lmweight 1 --wordscore 0 --eosscore 0 --silscore 0 --unkscore 0 --smearing max *** Aborted at 1593398213 (unix time) try "date -d @1593398213" if you are using GNU date *** PC: @ 0x55ef442d8248 fl::Module::param() *** SIGSEGV (@0x8) received by PID 268 (TID 0x7f197f510380) from PID 8; stack trace: *** @ 0x7f1977826890 (unknown) @ 0x55ef442d8248 fl::Module::param() @ 0x55ef43ff93ba main @ 0x7f1939c23b97 __libc_start_main @ 0x55ef44056d2a _start Segmentation fault (core dumped)
Please add empty set for am: --am=
Added it in. Unfortunately, the error stays the same.
So, any possible ways to take care of this? Thank you!
Can you post the full log?
001_log:
epoch: 1 | nupdates: 10001 | lr: 0.000000 | lrcriterion: 0.000000 | runtime: 02:29:40 | bch(ms): 897.95 | smp(ms): 0.48 | fwd(ms): 285.81 | crit-fwd(ms): 9.45 | bwd(ms): 377.39 | optim(ms): 230.94 | loss: 49.28250 | train-TER: 103.14 | train-WER: 102.21 | dev-clean-loss: 44.42949 | dev-clean-TER: 100.00 | dev-clean-WER: 100.00 | dev-other-loss: 41.85028 | dev-other-TER: 100.00 | dev-other-WER: 100.00 | avg-isz: 1222 | avg-tsz: 037 | max-tsz: 080 | hrs: 33.97 | thrpt(sec/sec): 13.62
Thank you!
I suspect the problem in the random samples for which emissions are saved. Can you make the new list with 10 head samples from you original list. Run again Test but on this new list, specifying maxload=-1
and then Decode with this new list too and maxload=-1
My new list:
When I ran Test with the new list, I got an error:
root@cfcc3017cc69:~# wav2letter/build/Test \ > --am /root/w2l/saved_models/pre-trained_acoustic_models/am_tds_ctc_librispeech_dev_clean.bin \ > --maxload -1 \ > --test /root/w2l/lists/new-test-clean.lst \ > --tokensdir /root/w2l/am \ > --tokens librispeech-train-all-unigram-10000.tokens \ > --lexicon /root/w2l/am/librispeech-train+dev-unigram-10000-nbest10.lexicon \ > --emission_dir /root/w2l/saved_models/pre-trained_acoustic_models I0702 16:10:35.827715 187 Test.cpp:83] Gflags after parsing --flagfile=; --fromenv=; --tryfromenv=; --undefok=; --tab_completion_columns=80; --tab_completion_word=; --help=false; --helpfull=false; --helpmatch=; --helpon=; --helppackage=false; --helpshort=false; --helpxml=false; --version=false; --adambeta1=0.90000000000000002; --adambeta2=0.999; --am=/root/w2l/saved_models/pre-trained_acoustic_models/am_tds_ctc_librispeech_dev_clean.bin; --am_decoder_tr_dropout=0; --am_decoder_tr_layerdrop=0; --am_decoder_tr_layers=1; --arch=tds_do0.15_l5.6.10_mid3.0_ctc.arch2; --archdir=/private/home/qiantong/push_numbers/200M/do0.15_l5.6.10_mid3.0_incDO; --attention=content; --attentionthreshold=0; --attnWindow=no; --attnconvchannel=0; --attnconvkernel=0; --attndim=0; --batchsize=4; --beamsize=2500; --beamsizetoken=250000; --beamthreshold=25; --blobdata=false; --channels=1; --criterion=ctc; --critoptim=sgd; --datadir=; --dataorder=input; --decoderattnround=1; --decoderdropout=0; --decoderrnnlayer=1; --decodertype=wrd; --devwin=0; --emission_dir=/root/w2l/saved_models/pre-trained_acoustic_models; --emission_queue_size=3000; --enable_distributed=true; --encoderdim=0; --eosscore=0; --eostoken=false; --everstoredb=false; --fftcachesize=1; --filterbanks=80; --flagsfile=push_numbers/200M/do0.15_l5.6.10_mid3.0_incDO/100.cfg; --framesizems=30; --framestridems=10; --gamma=0.5; --gumbeltemperature=1; --input=flac; --inputbinsize=100; --inputfeeding=false; --isbeamdump=false; --iter=1500; --itersave=false; --labelsmooth=0; --leftWindowSize=50; --lexicon=/root/w2l/am/librispeech-train+dev-unigram-10000-nbest10.lexicon; --linlr=-1; --linlrcrit=-1; --linseg=0; --lm=; --lm_memory=5000; --lm_vocab=; --lmtype=kenlm; --lmweight=0; --localnrmlleftctx=0; --localnrmlrightctx=0; --logadd=false; --lr=0.29999999999999999; --lr_decay=9223372036854775807; --lr_decay_step=9223372036854775807; --lrcosine=false; --lrcrit=0; --max_devices_per_node=8; --maxdecoderoutputlen=200; --maxgradnorm=1; --maxisz=9223372036854775807; --maxload=-1; --maxrate=10; --maxsil=50; --maxtsz=9223372036854775807; --maxword=-1; --melfloor=1; --memstepsize=8338608; --mfcc=false; --mfcccoeffs=13; --mfsc=true; --minisz=0; --minrate=3; --minsil=0; --mintsz=0; --momentum=0.5; --netoptim=sgd; --noresample=false; --nthread=10; --nthread_decoder=1; --nthread_decoder_am_forward=1; --numattnhead=8; --onorm=target; --optimepsilon=1e-08; --optimrho=0.90000000000000002; --outputbinsize=5; --pctteacherforcing=100; --pcttraineval=100; --pow=false; --pretrainWindow=0; --replabel=0; --reportiters=0; --rightWindowSize=50; --rndv_filepath=/checkpoint/qiantong/ls_200M/do0.15_l5.6.10_mid3.0_incDO/100_rndv; --rundir=/checkpoint/qiantong/ls_200M/do0.15_l5.6.10_mid3.0_incDO; --runname=100; --samplerate=16000; --sampletarget=0; --samplingstrategy=rand; --saug_fmaskf=27; --saug_fmaskn=2; --saug_start_update=-1; --saug_tmaskn=2; --saug_tmaskp=1; --saug_tmaskt=100; --sclite=; --seed=2; --show=false; --showletters=false; --silscore=0; --smearing=none; --smoothingtemperature=1; --softwoffset=10; --softwrate=5; --softwstd=5; --sqnorm=true; --stepsize=200; --surround=; --tag=; --target=ltr; --test=/root/w2l/lists/new-test-clean.lst; --tokens=librispeech-train-all-unigram-10000.tokens; --tokensdir=/root/w2l/am; --train=/checkpoint/antares/datasets/librispeech/lists/train-clean-100.lst,/checkpoint/antares/datasets/librispeech/lists/train-clean-360.lst,/checkpoint/antares/datasets/librispeech/lists/train-other-500.lst; --trainWithWindow=false; --transdiag=0; --unkscore=-inf; --use_memcache=false; --uselexicon=true; --usewordpiece=true; --valid=librispeech/dev-other:/checkpoint/antares/datasets/librispeech/lists/dev-other.lst,librispeech/dev-clean:/checkpoint/antares/datasets/librispeech/lists/dev-clean.lst; --warmup=8000; --weightdecay=0; --wordscore=0; --wordseparator=_; --world_rank=0; --world_size=64; --alsologtoemail=; --alsologtostderr=false; --colorlogtostderr=false; --drop_log_memory=true; --log_backtrace_at=; --log_dir=; --log_link=; --log_prefix=true; --logbuflevel=0; --logbufsecs=30; --logemaillevel=999; --logfile_mode=436; --logmailer=/bin/mail; --logtostderr=true; --max_log_size=1800; --minloglevel=0; --stderrthreshold=2; --stop_logging_if_full_disk=false; --symbolize_stacktrace=true; --v=0; --vmodule=; I0702 16:10:35.830458 187 Test.cpp:104] Number of classes (network): 9998 I0702 16:10:36.297570 187 Test.cpp:111] Number of words: 89333 F0702 16:10:36.470782 187 W2lListFilesDataset.cpp:116] Cannot parse *** Check failure stack trace: *** @ 0x7f2ecfb240cd google::LogMessage::Fail() @ 0x7f2ecfb25f33 google::LogMessage::SendToLog() @ 0x7f2ecfb23c28 google::LogMessage::Flush() @ 0x7f2ecfb26999 google::LogMessageFatal::~LogMessageFatal() @ 0x563e4894fdcb w2l::W2lListFilesDataset::loadListFile() @ 0x563e4895092d w2l::W2lListFilesDataset::W2lListFilesDataset() @ 0x563e4897459e w2l::createDataset() @ 0x563e48773e00 main @ 0x7f2ecee09b97 __libc_start_main @ 0x563e487d081a _start Aborted (core dumped)
What is wrong with that? Thank you!
Did you do head of file? or copy somehow else? your files cannot be parsed, which meas format is different from expected one. Make sure you have correct format as for your original list.
Thank @tlikhomanenko for the quick response! I chose 10 samples from:
Right now original Test with test-clean.lst works, but it does not work with the new list and maxload = -1. Even if I copy and paste the top 10 samples from test-clean.lst to make the new list, Test does work with the new list.
I checked a few times. It sounds the new list and the original list, test-clean.lst, have the same format (re: tabs, blank spaces, end of line markers etc.) .
Can you try to remove emission dir and run it again?
Also you can try with tail of the file. Just to debug
Thank @tlikhomanenko for the help and patience! The original list does not have an end-of-line marker right after the last sample, but I have. I fixed it, and Test went through. Unfortunately, after that when I ran Decoder with test_decode.cfg:
--emission_dir=/root/w2l/saved_models/pre-trained_acoustic_models --tokensdir=/root/w2l/am --tokens=librispeech-train-all-unigram-10000.tokens --test=/root/w2l/lists/new-test-clean.lst --maxload=-1 --nthread_decoder=2 --show --sholetters --lexicon=/root/w2l/decoder/decoder-unigram-10000-nbest10.lexicon --uselexicon=true --lm=/root/w2l/decoder/lm_librispeech_convlm_word_14B.bin --lmtype=kenlm --decodertype=wrd --beamsize=100 --beamsizetoken=100 --beamthreshold=20
I got the same error as the above (See https://github.com/facebookresearch/wav2letter/issues/711#issuecomment-650875083):
root@cfcc3017cc69:~# wav2letter/build/Decoder \ > --flagsfile wav2letter/build/test_decode.cfg \ > --lmweight 1 \ > --wordscore 0 \ > --eosscore 0 \ > --silscore 0 \ > --unkscore 0 \ > --smearing max *** Aborted at 1593719110 (unix time) try "date -d @1593719110" if you are using GNU date *** PC: @ 0x560a4f4aa248 fl::Module::param() *** SIGSEGV (@0x8) received by PID 311 (TID 0x7efd9471b380) from PID 8; stack trace: *** @ 0x7efd8ca31890 (unknown) @ 0x560a4f4aa248 fl::Module::param() @ 0x560a4f1cb3ba main @ 0x7efd4ee2eb97 __libc_start_main @ 0x560a4f228d2a _start Segmentation fault (core dumped)
You are using convlm, not kenlm, so it should be
--lmtype=convlm
Thank @tlikhomanenko for the patience and help! On my new hard drive, I ran the Test on a list with 10 head samples from the original list, and it went through. With the change
--lmtype=convlm
in the decoder config:
--am= --emission_dir=/root/w2l/pre-trained_acoustic_models --tokensdir=/root/w2l/am --tokens=librispeech-train-all-unigram-10000.tokens --test=/root/w2l/lists/new-test-clean.lst --maxload=-1 --nthread_decoder=2 --show --sholetters --lexicon=/root/w2l/decoder/decoder-unigram-10000-nbest10.lexicon --uselexicon=true --lm=/root/w2l/decoder/lm_librispeech_convlm_word_14B.bin --lmtype=convlm --decodertype=wrd --beamsize=100 --beamsizetoken=100 --beamthreshold=20
the error stays the same when I ran Decoder.
Can you post full screen output of your decoder run?
Thanks for asking! Here you go:
root@e90078050412:~# wav2letter/build/Decoder \ > --flagsfile wav2letter/build/test_decode.cfg \ > --lmweight 1 \ > --wordscore 0 \ > --eosscore 0 \ > --silscore 0 \ > --unkscore 0 \ > --smearing max *** Aborted at 1596343161 (unix time) try "date -d @1596343161" if you are using GNU date *** PC: @ 0x55cd76e8f528 (unknown) *** SIGSEGV (@0x8) received by PID 120 (TID 0x7fadbb7c3380) from PID 8; stack trace: *** @ 0x7fadb3ad9890 (unknown) @ 0x55cd76e8f528 (unknown) @ 0x55cd76bb3473 (unknown) @ 0x7fad75ed6b97 __libc_start_main @ 0x55cd76c13b2a (unknown) Segmentation fault (core dumped)
Anyone who runs successfully the TDS CTC model or a similar wav2letter model on a machine with only 1 GPU (4 GB)? If so, how? Thank you!
Can you add --minloglevel=0 --logtostderr=1
to see the full log on the screen (you should see for example data/model loading, as well as all flags printed on the screen)?
And again, could you confirm that decode command is working for you in case of kenlm ngram model decoding (without saving emissions and with saved emissions)?
root@e90078050412:~# wav2letter/build/Decoder \ > --flagsfile wav2letter/build/test_decode.cfg \ > --minloglevel 0 \ > --logtostderr 1 \ > --lmweight 1 \ > --wordscore 0 \ > --eosscore 0 \ > --silscore 0 \ > --unkscore 0 \ > --smearing max I0805 09:07:37.017644 122 Decode.cpp:58] Reading flags from file wav2letter/build/test_decode.cfg I0805 09:07:37.017838 122 Decode.cpp:106] Gflags after parsing --flagfile=; --fromenv=; --tryfromenv=; --undefok=; --tab_completion_columns=80; --tab_completion_word=; --help=false; --helpfull=false; --helpmatch=; --helpon=; --helppackage=false; --helpshort=false; --helpxml=false; --version=false; --adambeta1=0.90000000000000002; --adambeta2=0.999; --am=; --am_decoder_tr_dropout=0; --am_decoder_tr_layerdrop=0; --am_decoder_tr_layers=1; --arch=default; --archdir=; --attention=content; --attentionthreshold=2147483647; --attnWindow=no; --attnconvchannel=0; --attnconvkernel=0; --attndim=0; --batchsize=1; --beamsize=100; --beamsizetoken=100; --beamthreshold=20; --blobdata=false; --channels=1; --criterion=asg; --critoptim=sgd; --datadir=; --dataorder=input; --decoderattnround=1; --decoderdropout=0; --decoderrnnlayer=1; --decodertype=wrd; --devwin=0; --emission_dir=/root/w2l/pre-trained_acoustic_models; --emission_queue_size=3000; --enable_distributed=false; --encoderdim=0; --eosscore=0; --eostoken=false; --everstoredb=false; --fftcachesize=1; --filterbanks=40; --flagsfile=wav2letter/build/test_decode.cfg; --framesizems=25; --framestridems=10; --gamma=1; --gumbeltemperature=1; --input=flac; --inputbinsize=100; --inputfeeding=false; --isbeamdump=false; --iter=9223372036854775807; --itersave=false; --labelsmooth=0; --leftWindowSize=50; --lexicon=/root/w2l/decoder/decoder-unigram-10000-nbest10.lexicon; --linlr=-1; --linlrcrit=-1; --linseg=0; --lm=/root/w2l/decoder/lm_librispeech_convlm_word_14B.bin; --lm_memory=5000; --lm_vocab=; --lmtype=convlm; --lmweight=1; --localnrmlleftctx=0; --localnrmlrightctx=0; --logadd=false; --lr=1; --lr_decay=9223372036854775807; --lr_decay_step=9223372036854775807; --lrcosine=false; --lrcrit=0; --max_devices_per_node=8; --maxdecoderoutputlen=200; --maxgradnorm=0; --maxisz=9223372036854775807; --maxload=-1; --maxrate=10; --maxsil=50; --maxtsz=9223372036854775807; --maxword=-1; --melfloor=1; --memstepsize=10485760; --mfcc=false; --mfcccoeffs=13; --mfsc=false; --minisz=0; --minrate=3; --minsil=0; --mintsz=0; --momentum=0; --netoptim=sgd; --noresample=false; --nthread=1; --nthread_decoder=2; --nthread_decoder_am_forward=1; --numattnhead=8; --onorm=none; --optimepsilon=1e-08; --optimrho=0.90000000000000002; --outputbinsize=5; --pctteacherforcing=100; --pcttraineval=100; --pow=false; --pretrainWindow=0; --replabel=0; --reportiters=0; --rightWindowSize=50; --rndv_filepath=; --rundir=; --runname=; --samplerate=16000; --sampletarget=0; --samplingstrategy=rand; --saug_fmaskf=27; --saug_fmaskn=2; --saug_start_update=-1; --saug_tmaskn=2; --saug_tmaskp=1; --saug_tmaskt=100; --sclite=; --seed=0; --show=true; --showletters=false; --silscore=0; --smearing=max; --smoothingtemperature=1; --softwoffset=10; --softwrate=5; --softwstd=5; --sqnorm=false; --stepsize=9223372036854775807; --surround=; --tag=; --target=tkn; --test=/root/w2l/lists/new-test-clean.lst; --tokens=librispeech-train-all-unigram-10000.tokens; --tokensdir=/root/w2l/am; --train=; --trainWithWindow=false; --transdiag=0; --unkscore=0; --use_memcache=false; --uselexicon=true; --usewordpiece=false; --valid=; --warmup=1; --weightdecay=0; --wordscore=0; --wordseparator=|; --world_rank=0; --world_size=1; --alsologtoemail=; --alsologtostderr=false; --colorlogtostderr=false; --drop_log_memory=true; --log_backtrace_at=; --log_dir=; --log_link=; --log_prefix=true; --logbuflevel=0; --logbufsecs=30; --logemaillevel=999; --logfile_mode=436; --logmailer=/bin/mail; --logtostderr=true; --max_log_size=1800; --minloglevel=0; --stderrthreshold=2; --stop_logging_if_full_disk=false; --symbolize_stacktrace=true; --v=0; --vmodule=; I0805 09:07:37.020831 122 Decode.cpp:127] Number of classes (network): 9997 I0805 09:07:38.188450 122 Decode.cpp:134] Number of words: 200001 *** Aborted at 1596618458 (unix time) try "date -d @1596618458" if you are using GNU date *** PC: @ 0x5586ff2e0528 (unknown) *** SIGSEGV (@0x8) received by PID 122 (TID 0x7f1d39b0c380) from PID 8; stack trace: *** @ 0x7f1d31e22890 (unknown) @ 0x5586ff2e0528 (unknown) @ 0x5586ff004473 (unknown) @ 0x7f1cf421fb97 __libc_start_main @ 0x5586ff064b2a (unknown) Segmentation fault (core dumped)
I ran successfully the model (so both train and decode commands went through):
E2E Speech Recognition on Librispeech-Clean Dataset https://github.com/facebookresearch/wav2letter/tree/master/tutorials/1-librispeech_clean
So, would you like me to try other decode commands? Thank you!
I want you to run exactly the same decoding command s you running now but with kenlm model, not convlm. And also try without saving emissions.
Another necessary fixes for convlm config:
--lm_vocab
for convlm, this is mapping between convlm ptrdictions indices and tokens. --nthread_decoder=1
because you have only 1 GPU and convlm will be run on GPU.LOG(INFO) << "Here";
and before this row https://github.com/facebookresearch/wav2letter/blob/master/Decode.cpp#L230 LOG(INFO) << "Init device";
, recompile, run again and post log here?Thank @tlikhomanenko for the kind help! Inside the Docker container, I modified Decode.cpp, as suggested by @tlikhomanenko, in the wav2letter folder. When I compiled it, I got:
root@3f9d46061788:~/wav2letter# g++ Decode.cpp In file included from /usr/local/include/flashlight/autograd/Variable.h:25:0, from /usr/local/include/flashlight/autograd/Utils.h:11, from /usr/local/include/flashlight/autograd/autograd.h:20, from /usr/local/include/flashlight/flashlight.h:11, from Decode.cpp:17: /usr/local/include/flashlight/common/Serialization.h:23:10: fatal error: cereal/access.hpp: No such file or directory #include <cereal/access.hpp> ^~~~~~~~~~~~~~~~~~~ compilation terminated. root@3f9d46061788:~/wav2letter# gcc Decode.cpp In file included from /usr/local/include/flashlight/autograd/Variable.h:25:0, from /usr/local/include/flashlight/autograd/Utils.h:11, from /usr/local/include/flashlight/autograd/autograd.h:20, from /usr/local/include/flashlight/flashlight.h:11, from Decode.cpp:17: /usr/local/include/flashlight/common/Serialization.h:23:10: fatal error: cereal/access.hpp: No such file or directory #include <cereal/access.hpp> ^~~~~~~~~~~~~~~~~~~ compilation terminated.
Any comments? What is the right command to compile it? Thank you!
You should follow this compilation steps
https://github.com/facebookresearch/wav2letter/blob/master/Dockerfile-CUDA#L22
To compile it, is it correct to run the following command in the wav2letter folder?
$ sudo docker build -f Dockerfile-CUDA --no-cache .
Thank you!
you don't need to rebuild full image, just go to your container where you are running everything and there change code as I pointed and then do commands
export MKLROOT=/opt/intel/mkl && export KENLM_ROOT_DIR=/root/kenlm && \
cd /root/wav2letter && mkdir -p build && \
cd build && cmake .. -DCMAKE_BUILD_TYPE=Release -DW2L_LIBRARIES_USE_CUDA=ON -DW2L_BUILD_INFERENCE=ON && \
make -j$(nproc)
Thank @tlikhomanenko for the continuous help! With the above changes, for the convlm model, I got:
root@0614c5da94b4:~# wav2letter/build/Decoder \ > --flagsfile wav2letter/build/test_decode.cfg \ > --minloglevel 0 \ > --logtostderr 1 \ > --lmweight 1 \ > --wordscore 0 \ > --eosscore 0 \ > --silscore 0 \ > --unkscore 0 \ > --smearing max I0808 20:32:21.031849 1014 Decode.cpp:58] Reading flags from file wav2letter/build/test_decode.cfg I0808 20:32:21.032115 1014 Decode.cpp:106] Gflags after parsing --flagfile=; --fromenv=; --tryfromenv=; --undefok=; --tab_completion_columns=80; --tab_completion_word=; --help=false; --helpfull=false; --helpmatch=; --helpon=; --helppackage=false; --helpshort=false; --helpxml=false; --version=false; --adambeta1=0.90000000000000002; --adambeta2=0.999; --am=; --am_decoder_tr_dropout=0; --am_decoder_tr_layerdrop=0; --am_decoder_tr_layers=1; --arch=default; --archdir=; --attention=content; --attentionthreshold=2147483647; --attnWindow=no; --attnconvchannel=0; --attnconvkernel=0; --attndim=0; --batchsize=1; --beamsize=100; --beamsizetoken=100; --beamthreshold=20; --blobdata=false; --channels=1; --criterion=asg; --critoptim=sgd; --datadir=; --dataorder=input; --decoderattnround=1; --decoderdropout=0; --decoderrnnlayer=1; --decodertype=wrd; --devwin=0; --emission_dir=/root/w2l/pre-trained_acoustic_models; --emission_queue_size=3000; --enable_distributed=false; --encoderdim=0; --eosscore=0; --eostoken=false; --everstoredb=false; --fftcachesize=1; --filterbanks=40; --flagsfile=wav2letter/build/test_decode.cfg; --framesizems=25; --framestridems=10; --gamma=1; --gumbeltemperature=1; --input=flac; --inputbinsize=100; --inputfeeding=false; --isbeamdump=false; --iter=9223372036854775807; --itersave=false; --labelsmooth=0; --leftWindowSize=50; --lexicon=/root/w2l/decoder/decoder-unigram-10000-nbest10.lexicon; --linlr=-1; --linlrcrit=-1; --linseg=0; --lm=/root/w2l/decoder/lm_librispeech_convlm_word_14B.bin; --lm_memory=5000; --lm_vocab=/root/w2l/decoder/lm_librispeech_convlm_14B.vocab; --lmtype=convlm; --lmweight=1; --localnrmlleftctx=0; --localnrmlrightctx=0; --logadd=false; --lr=1; --lr_decay=9223372036854775807; --lr_decay_step=9223372036854775807; --lrcosine=false; --lrcrit=0; --max_devices_per_node=8; --maxdecoderoutputlen=200; --maxgradnorm=0; --maxisz=9223372036854775807; --maxload=-1; --maxrate=10; --maxsil=50; --maxtsz=9223372036854775807; --maxword=-1; --melfloor=1; --memstepsize=10485760; --mfcc=false; --mfcccoeffs=13; --mfsc=false; --minisz=0; --minrate=3; --minsil=0; --mintsz=0; --momentum=0; --netoptim=sgd; --noresample=false; --nthread=1; --nthread_decoder=1; --nthread_decoder_am_forward=1; --numattnhead=8; --onorm=none; --optimepsilon=1e-08; --optimrho=0.90000000000000002; --outputbinsize=5; --pctteacherforcing=100; --pcttraineval=100; --pow=false; --pretrainWindow=0; --replabel=0; --reportiters=0; --rightWindowSize=50; --rndv_filepath=; --rundir=; --runname=; --samplerate=16000; --sampletarget=0; --samplingstrategy=rand; --saug_fmaskf=27; --saug_fmaskn=2; --saug_start_update=-1; --saug_tmaskn=2; --saug_tmaskp=1; --saug_tmaskt=100; --sclite=; --seed=0; --show=true; --showletters=false; --silscore=0; --smearing=max; --smoothingtemperature=1; --softwoffset=10; --softwrate=5; --softwstd=5; --sqnorm=false; --stepsize=9223372036854775807; --surround=; --tag=; --target=tkn; --test=/root/w2l/lists/new-test-clean.lst; --tokens=librispeech-train-all-unigram-10000.tokens; --tokensdir=/root/w2l/am; --train=; --trainWithWindow=false; --transdiag=0; --unkscore=0; --use_memcache=false; --uselexicon=true; --usewordpiece=false; --valid=; --validbatchsize=-1; --warmup=1; --weightdecay=0; --wordscore=0; --wordseparator=|; --world_rank=0; --world_size=1; --alsologtoemail=; --alsologtostderr=false; --colorlogtostderr=false; --drop_log_memory=true; --log_backtrace_at=; --log_dir=; --log_link=; --log_prefix=true; --logbuflevel=0; --logbufsecs=30; --logemaillevel=999; --logfile_mode=436; --logmailer=/bin/mail; --logtostderr=true; --max_log_size=1800; --minloglevel=0; --stderrthreshold=2; --stop_logging_if_full_disk=false; --symbolize_stacktrace=true; --v=0; --vmodule=; I0808 20:32:21.035308 1014 Decode.cpp:127] Number of classes (network): 9997 I0808 20:32:22.087042 1014 Decode.cpp:134] Number of words: 200001 *** Aborted at 1596918742 (unix time) try "date -d @1596918742" if you are using GNU date *** PC: @ 0x557fd9df29c8 (unknown) *** SIGSEGV (@0x8) received by PID 1014 (TID 0x7f3d357a2380) from PID 8; stack trace: *** @ 0x7f3d2dab8890 (unknown) @ 0x557fd9df29c8 (unknown) @ 0x557fd9b0ab93 (unknown) @ 0x7f3cefeb5b97 __libc_start_main @ 0x557fd9b6ef3a (unknown) Segmentation fault (core dumped)
Also, with the above changes, for the kenlm model, with saved emissions, I got:
root@0614c5da94b4:~# wav2letter/build/Decoder \ > --flagsfile wav2letter/build/test_decode.cfg \ > --minloglevel 0 \ > --logtostderr 1 \ > --lmweight 1 \ > --wordscore 0 \ > --eosscore 0 \ > --silscore 0 \ > --unkscore 0 \ > --smearing max I0808 20:36:01.440773 1017 Decode.cpp:58] Reading flags from file wav2letter/build/test_decode.cfg I0808 20:36:01.441000 1017 Decode.cpp:106] Gflags after parsing --flagfile=; --fromenv=; --tryfromenv=; --undefok=; --tab_completion_columns=80; --tab_completion_word=; --help=false; --helpfull=false; --helpmatch=; --helpon=; --helppackage=false; --helpshort=false; --helpxml=false; --version=false; --adambeta1=0.90000000000000002; --adambeta2=0.999; --am=; --am_decoder_tr_dropout=0; --am_decoder_tr_layerdrop=0; --am_decoder_tr_layers=1; --arch=default; --archdir=; --attention=content; --attentionthreshold=2147483647; --attnWindow=no; --attnconvchannel=0; --attnconvkernel=0; --attndim=0; --batchsize=1; --beamsize=100; --beamsizetoken=100; --beamthreshold=20; --blobdata=false; --channels=1; --criterion=asg; --critoptim=sgd; --datadir=; --dataorder=input; --decoderattnround=1; --decoderdropout=0; --decoderrnnlayer=1; --decodertype=wrd; --devwin=0; --emission_dir=/root/w2l/pre-trained_acoustic_models; --emission_queue_size=3000; --enable_distributed=false; --encoderdim=0; --eosscore=0; --eostoken=false; --everstoredb=false; --fftcachesize=1; --filterbanks=40; --flagsfile=wav2letter/build/test_decode.cfg; --framesizems=25; --framestridems=10; --gamma=1; --gumbeltemperature=1; --input=flac; --inputbinsize=100; --inputfeeding=false; --isbeamdump=false; --iter=9223372036854775807; --itersave=false; --labelsmooth=0; --leftWindowSize=50; --lexicon=/root/w2l/decoder/decoder-unigram-10000-nbest10.lexicon; --linlr=-1; --linlrcrit=-1; --linseg=0; --lm=/root/w2l/decoder/lm_librispeech_convlm_word_14B.bin; --lm_memory=5000; --lm_vocab=/root/w2l/decoder/lm_librispeech_convlm_14B.vocab; --lmtype=kenlm; --lmweight=1; --localnrmlleftctx=0; --localnrmlrightctx=0; --logadd=false; --lr=1; --lr_decay=9223372036854775807; --lr_decay_step=9223372036854775807; --lrcosine=false; --lrcrit=0; --max_devices_per_node=8; --maxdecoderoutputlen=200; --maxgradnorm=0; --maxisz=9223372036854775807; --maxload=-1; --maxrate=10; --maxsil=50; --maxtsz=9223372036854775807; --maxword=-1; --melfloor=1; --memstepsize=10485760; --mfcc=false; --mfcccoeffs=13; --mfsc=false; --minisz=0; --minrate=3; --minsil=0; --mintsz=0; --momentum=0; --netoptim=sgd; --noresample=false; --nthread=1; --nthread_decoder=1; --nthread_decoder_am_forward=1; --numattnhead=8; --onorm=none; --optimepsilon=1e-08; --optimrho=0.90000000000000002; --outputbinsize=5; --pctteacherforcing=100; --pcttraineval=100; --pow=false; --pretrainWindow=0; --replabel=0; --reportiters=0; --rightWindowSize=50; --rndv_filepath=; --rundir=; --runname=; --samplerate=16000; --sampletarget=0; --samplingstrategy=rand; --saug_fmaskf=27; --saug_fmaskn=2; --saug_start_update=-1; --saug_tmaskn=2; --saug_tmaskp=1; --saug_tmaskt=100; --sclite=; --seed=0; --show=true; --showletters=false; --silscore=0; --smearing=max; --smoothingtemperature=1; --softwoffset=10; --softwrate=5; --softwstd=5; --sqnorm=false; --stepsize=9223372036854775807; --surround=; --tag=; --target=tkn; --test=/root/w2l/lists/new-test-clean.lst; --tokens=librispeech-train-all-unigram-10000.tokens; --tokensdir=/root/w2l/am; --train=; --trainWithWindow=false; --transdiag=0; --unkscore=0; --use_memcache=false; --uselexicon=true; --usewordpiece=false; --valid=; --validbatchsize=-1; --warmup=1; --weightdecay=0; --wordscore=0; --wordseparator=|; --world_rank=0; --world_size=1; --alsologtoemail=; --alsologtostderr=false; --colorlogtostderr=false; --drop_log_memory=true; --log_backtrace_at=; --log_dir=; --log_link=; --log_prefix=true; --logbuflevel=0; --logbufsecs=30; --logemaillevel=999; --logfile_mode=436; --logmailer=/bin/mail; --logtostderr=true; --max_log_size=1800; --minloglevel=0; --stderrthreshold=2; --stop_logging_if_full_disk=false; --symbolize_stacktrace=true; --v=0; --vmodule=; I0808 20:36:01.444080 1017 Decode.cpp:127] Number of classes (network): 9997 I0808 20:36:02.533449 1017 Decode.cpp:134] Number of words: 200001 *** Aborted at 1596918962 (unix time) try "date -d @1596918962" if you are using GNU date *** PC: @ 0x560c880929c8 (unknown) *** SIGSEGV (@0x8) received by PID 1017 (TID 0x7f110128d380) from PID 8; stack trace: *** @ 0x7f10f95a3890 (unknown) @ 0x560c880929c8 (unknown) @ 0x560c87daab93 (unknown) @ 0x7f10bb9a0b97 __libc_start_main @ 0x560c87e0ef3a (unknown) Segmentation fault (core dumped)
For the kenlm model, with all those changes, but with the full path of --am added, and without saving emissions:
root@0614c5da94b4:~# wav2letter/build/Decoder \ > --flagsfile wav2letter/build/test_decode.cfg \ > --minloglevel 0 \ > --logtostderr 1 \ > --lmweight 1 \ > --wordscore 0 \ > --eosscore 0 \ > --silscore 0 \ > --unkscore 0 \ > --smearing max I0808 21:13:20.241331 1034 Decode.cpp:58] Reading flags from file wav2letter/build/test_decode.cfg I0808 21:13:20.241497 1034 Decode.cpp:75] [Network] Reading acoustic model from /root/w2l/pre-trained_acoustic_models/am_tds_ctc_librispeech_dev_clean.bin I0808 21:13:20.908205 1034 Decode.cpp:79] [Network] Sequential [input -> (0) -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> (12) -> (13) -> (14) -> (15) -> (16) -> (17) -> (18) -> (19) -> (20) -> (21) -> (22) -> (23) -> (24) -> (25) -> (26) -> (27) -> (28) -> (29) -> (30) -> (31) -> (32) -> (33) -> (34) -> (35) -> (36) -> (37) -> output] (0): SpecAugment ( W: 80, F: 27, mF: 2, T: 100, p: 1, mT: 2 ) (1): View (-1 80 1 0) (2): Conv2D (1->10, 21x1, 2,1, SAME,SAME, 1, 1) (with bias) (3): ReLU (4): Dropout (0.000000) (5): LayerNorm ( axis : { 0 1 2 } , size : -1) (6): Time-Depth Separable Block (21, 240, 10) [800 -> 2400 -> 800] (7): Time-Depth Separable Block (21, 240, 10) [800 -> 2400 -> 800] (8): Time-Depth Separable Block (21, 240, 10) [800 -> 2400 -> 800] (9): Time-Depth Separable Block (21, 240, 10) [800 -> 2400 -> 800] (10): Time-Depth Separable Block (21, 240, 10) [800 -> 2400 -> 800] (11): Conv2D (10->14, 21x1, 2,1, SAME,SAME, 1, 1) (with bias) (12): ReLU (13): Dropout (0.000000) (14): LayerNorm ( axis : { 0 1 2 } , size : -1) (15): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120] (16): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120] (17): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120] (18): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120] (19): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120] (20): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120] (21): Conv2D (14->18, 21x1, 2,1, SAME,SAME, 1, 1) (with bias) (22): ReLU (23): Dropout (0.000000) (24): LayerNorm ( axis : { 0 1 2 } , size : -1) (25): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (26): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (27): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (28): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (29): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (30): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (31): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (32): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (33): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (34): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (35): View (0 1440 1 0) (36): Reorder (1,0,3,2) (37): Linear (1440->9998) (with bias) I0808 21:13:20.908303 1034 Decode.cpp:82] [Criterion] ConnectionistTemporalClassificationCriterion I0808 21:13:20.908306 1034 Decode.cpp:84] [Network] Number of params: 203394122 I0808 21:13:20.908358 1034 Decode.cpp:90] [Network] Updating flags from config file: /root/w2l/pre-trained_acoustic_models/am_tds_ctc_librispeech_dev_clean.bin I0808 21:13:20.908658 1034 Decode.cpp:106] Gflags after parsing --flagfile=; --fromenv=; --tryfromenv=; --undefok=; --tab_completion_columns=80; --tab_completion_word=; --help=false; --helpfull=false; --helpmatch=; --helpon=; --helppackage=false; --helpshort=false; --helpxml=false; --version=false; --adambeta1=0.90000000000000002; --adambeta2=0.999; --am=/root/w2l/pre-trained_acoustic_models/am_tds_ctc_librispeech_dev_clean.bin; --am_decoder_tr_dropout=0; --am_decoder_tr_layerdrop=0; --am_decoder_tr_layers=1; --arch=tds_do0.15_l5.6.10_mid3.0_ctc.arch2; --archdir=/private/home/qiantong/push_numbers/200M/do0.15_l5.6.10_mid3.0_incDO; --attention=content; --attentionthreshold=0; --attnWindow=no; --attnconvchannel=0; --attnconvkernel=0; --attndim=0; --batchsize=4; --beamsize=100; --beamsizetoken=100; --beamthreshold=20; --blobdata=false; --channels=1; --criterion=ctc; --critoptim=sgd; --datadir=; --dataorder=input; --decoderattnround=1; --decoderdropout=0; --decoderrnnlayer=1; --decodertype=wrd; --devwin=0; --emission_dir=; --emission_queue_size=3000; --enable_distributed=true; --encoderdim=0; --eosscore=0; --eostoken=false; --everstoredb=false; --fftcachesize=1; --filterbanks=80; --flagsfile=wav2letter/build/test_decode.cfg; --framesizems=30; --framestridems=10; --gamma=0.5; --gumbeltemperature=1; --input=flac; --inputbinsize=100; --inputfeeding=false; --isbeamdump=false; --iter=1500; --itersave=false; --labelsmooth=0; --leftWindowSize=50; --lexicon=/root/w2l/decoder/decoder-unigram-10000-nbest10.lexicon; --linlr=-1; --linlrcrit=-1; --linseg=0; --lm=/root/w2l/decoder/lm_librispeech_convlm_word_14B.bin; --lm_memory=5000; --lm_vocab=/root/w2l/decoder/lm_librispeech_convlm_14B.vocab; --lmtype=kenlm; --lmweight=1; --localnrmlleftctx=0; --localnrmlrightctx=0; --logadd=false; --lr=0.29999999999999999; --lr_decay=9223372036854775807; --lr_decay_step=9223372036854775807; --lrcosine=false; --lrcrit=0; --max_devices_per_node=8; --maxdecoderoutputlen=200; --maxgradnorm=1; --maxisz=9223372036854775807; --maxload=-1; --maxrate=10; --maxsil=50; --maxtsz=9223372036854775807; --maxword=-1; --melfloor=1; --memstepsize=8338608; --mfcc=false; --mfcccoeffs=13; --mfsc=true; --minisz=0; --minrate=3; --minsil=0; --mintsz=0; --momentum=0.5; --netoptim=sgd; --noresample=false; --nthread=10; --nthread_decoder=1; --nthread_decoder_am_forward=1; --numattnhead=8; --onorm=target; --optimepsilon=1e-08; --optimrho=0.90000000000000002; --outputbinsize=5; --pctteacherforcing=100; --pcttraineval=100; --pow=false; --pretrainWindow=0; --replabel=0; --reportiters=0; --rightWindowSize=50; --rndv_filepath=/checkpoint/qiantong/ls_200M/do0.15_l5.6.10_mid3.0_incDO/100_rndv; --rundir=/checkpoint/qiantong/ls_200M/do0.15_l5.6.10_mid3.0_incDO; --runname=100; --samplerate=16000; --sampletarget=0; --samplingstrategy=rand; --saug_fmaskf=27; --saug_fmaskn=2; --saug_start_update=-1; --saug_tmaskn=2; --saug_tmaskp=1; --saug_tmaskt=100; --sclite=; --seed=2; --show=true; --showletters=false; --silscore=0; --smearing=max; --smoothingtemperature=1; --softwoffset=10; --softwrate=5; --softwstd=5; --sqnorm=true; --stepsize=200; --surround=; --tag=; --target=ltr; --test=/root/w2l/lists/new-test-clean.lst; --tokens=librispeech-train-all-unigram-10000.tokens; --tokensdir=/root/w2l/am; --train=/checkpoint/antares/datasets/librispeech/lists/train-clean-100.lst,/checkpoint/antares/datasets/librispeech/lists/train-clean-360.lst,/checkpoint/antares/datasets/librispeech/lists/train-other-500.lst; --trainWithWindow=false; --transdiag=0; --unkscore=0; --use_memcache=false; --uselexicon=true; --usewordpiece=true; --valid=librispeech/dev-other:/checkpoint/antares/datasets/librispeech/lists/dev-other.lst,librispeech/dev-clean:/checkpoint/antares/datasets/librispeech/lists/dev-clean.lst; --validbatchsize=-1; --warmup=1; --weightdecay=0; --wordscore=0; --wordseparator=_; --world_rank=0; --world_size=64; --alsologtoemail=; --alsologtostderr=false; --colorlogtostderr=false; --drop_log_memory=true; --log_backtrace_at=; --log_dir=; --log_link=; --log_prefix=true; --logbuflevel=0; --logbufsecs=30; --logemaillevel=999; --logfile_mode=436; --logmailer=/bin/mail; --logtostderr=true; --max_log_size=1800; --minloglevel=0; --stderrthreshold=2; --stop_logging_if_full_disk=false; --symbolize_stacktrace=true; --v=0; --vmodule=; I0808 21:13:20.911502 1034 Decode.cpp:127] Number of classes (network): 9998 I0808 21:13:21.963410 1034 Decode.cpp:134] Number of words: 200001 I0808 21:13:22.048540 1034 Decode.cpp:221] Here Loading the LM will be faster if you build a binary file. Reading /root/w2l/decoder/lm_librispeech_convlm_word_14B.bin ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100 terminate called after throwing an instance of 'util::EndOfFileException' what(): End of file Byte: 0 *** Aborted at 1596921202 (unix time) try "date -d @1596921202" if you are using GNU date *** PC: @ 0x7f4b77689e97 gsignal *** SIGABRT (@0x40a) received by PID 1034 (TID 0x7f4bbcf59380) from PID 1034; stack trace: *** @ 0x7f4bb526f890 (unknown) @ 0x7f4b77689e97 gsignal @ 0x7f4b7768b801 abort @ 0x7f4b7807e957 (unknown) @ 0x7f4b78084ab6 (unknown) @ 0x7f4b78084af1 std::terminate() @ 0x7f4b78084d79 __cxa_rethrow @ 0x5625a8d4bf84 lm::ngram::detail::GenericModel<>::InitializeFromARPA() @ 0x5625a8d4ddc5 lm::ngram::detail::GenericModel<>::GenericModel() @ 0x5625a8d44fab lm::ngram::LoadVirtual() @ 0x5625a8c6a649 w2l::KenLM::KenLM() @ 0x5625a8b14480 main @ 0x7f4b7766cb97 __libc_start_main @ 0x5625a8b76f3a _start Aborted (core dumped)
For the kenlm model, with all those changes, but with the full path of --am added, and without saving emissions:
root@0614c5da94b4:~# wav2letter/build/Decoder \
--flagsfile wav2letter/build/test_decode.cfg \ --minloglevel 0 \ --logtostderr 1 \ --lmweight 1 \ --wordscore 0 \ --eosscore 0 \ --silscore 0 \ --unkscore 0 \ --smearing max I0808 21:13:20.241331 1034 Decode.cpp:58] Reading flags from file wav2letter/build/test_decode.cfg I0808 21:13:20.241497 1034 Decode.cpp:75] [Network] Reading acoustic model from /root/w2l/pre-trained_acoustic_models/am_tds_ctc_librispeech_dev_clean.bin I0808 21:13:20.908205 1034 Decode.cpp:79] [Network] Sequential [input -> (0) -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> (12) -> (13) -> (14) -> (15) -> (16) -> (17) -> (18) -> (19) -> (20) -> (21) -> (22) -> (23) -> (24) -> (25) -> (26) -> (27) -> (28) -> (29) -> (30) -> (31) -> (32) -> (33) -> (34) -> (35) -> (36) -> (37) -> output] (0): SpecAugment ( W: 80, F: 27, mF: 2, T: 100, p: 1, mT: 2 ) (1): View (-1 80 1 0) (2): Conv2D (1->10, 21x1, 2,1, SAME,SAME, 1, 1) (with bias) (3): ReLU (4): Dropout (0.000000) (5): LayerNorm ( axis : { 0 1 2 } , size : -1) (6): Time-Depth Separable Block (21, 240, 10) [800 -> 2400 -> 800] (7): Time-Depth Separable Block (21, 240, 10) [800 -> 2400 -> 800] (8): Time-Depth Separable Block (21, 240, 10) [800 -> 2400 -> 800] (9): Time-Depth Separable Block (21, 240, 10) [800 -> 2400 -> 800] (10): Time-Depth Separable Block (21, 240, 10) [800 -> 2400 -> 800] (11): Conv2D (10->14, 21x1, 2,1, SAME,SAME, 1, 1) (with bias) (12): ReLU (13): Dropout (0.000000) (14): LayerNorm ( axis : { 0 1 2 } , size : -1) (15): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120] (16): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120] (17): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120] (18): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120] (19): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120] (20): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120] (21): Conv2D (14->18, 21x1, 2,1, SAME,SAME, 1, 1) (with bias) (22): ReLU (23): Dropout (0.000000) (24): LayerNorm ( axis : { 0 1 2 } , size : -1) (25): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (26): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (27): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (28): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (29): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (30): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (31): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (32): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (33): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (34): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (35): View (0 1440 1 0) (36): Reorder (1,0,3,2) (37): Linear (1440->9998) (with bias) I0808 21:13:20.908303 1034 Decode.cpp:82] [Criterion] ConnectionistTemporalClassificationCriterion I0808 21:13:20.908306 1034 Decode.cpp:84] [Network] Number of params: 203394122 I0808 21:13:20.908358 1034 Decode.cpp:90] [Network] Updating flags from config file: /root/w2l/pre-trained_acoustic_models/am_tds_ctc_librispeech_dev_clean.bin I0808 21:13:20.908658 1034 Decode.cpp:106] Gflags after parsing --flagfile=; --fromenv=; --tryfromenv=; --undefok=; --tab_completion_columns=80; --tab_completion_word=; --help=false; --helpfull=false; --helpmatch=; --helpon=; --helppackage=false; --helpshort=false; --helpxml=false; --version=false; --adambeta1=0.90000000000000002; --adambeta2=0.999; --am=/root/w2l/pre-trained_acoustic_models/am_tds_ctc_librispeech_dev_clean.bin; --am_decoder_tr_dropout=0; --am_decoder_tr_layerdrop=0; --am_decoder_tr_layers=1; --arch=tds_do0.15_l5.6.10_mid3.0_ctc.arch2; --archdir=/private/home/qiantong/push_numbers/200M/do0.15_l5.6.10_mid3.0_incDO; --attention=content; --attentionthreshold=0; --attnWindow=no; --attnconvchannel=0; --attnconvkernel=0; --attndim=0; --batchsize=4; --beamsize=100; --beamsizetoken=100; --beamthreshold=20; --blobdata=false; --channels=1; --criterion=ctc; --critoptim=sgd; --datadir=; --dataorder=input; --decoderattnround=1; --decoderdropout=0; --decoderrnnlayer=1; --decodertype=wrd; --devwin=0; --emission_dir=; --emission_queue_size=3000; --enable_distributed=true; --encoderdim=0; --eosscore=0; --eostoken=false; --everstoredb=false; --fftcachesize=1; --filterbanks=80; --flagsfile=wav2letter/build/test_decode.cfg; --framesizems=30; --framestridems=10; --gamma=0.5; --gumbeltemperature=1; --input=flac; --inputbinsize=100; --inputfeeding=false; --isbeamdump=false; --iter=1500; --itersave=false; --labelsmooth=0; --leftWindowSize=50; --lexicon=/root/w2l/decoder/decoder-unigram-10000-nbest10.lexicon; --linlr=-1; --linlrcrit=-1; --linseg=0; --lm=/root/w2l/decoder/lm_librispeech_convlm_word_14B.bin; --lm_memory=5000; --lm_vocab=/root/w2l/decoder/lm_librispeech_convlm_14B.vocab; --lmtype=kenlm; --lmweight=1; --localnrmlleftctx=0; --localnrmlrightctx=0; --logadd=false; --lr=0.29999999999999999; --lr_decay=9223372036854775807; --lr_decay_step=9223372036854775807; --lrcosine=false; --lrcrit=0; --max_devices_per_node=8; --maxdecoderoutputlen=200; --maxgradnorm=1; --maxisz=9223372036854775807; --maxload=-1; --maxrate=10; --maxsil=50; --maxtsz=9223372036854775807; --maxword=-1; --melfloor=1; --memstepsize=8338608; --mfcc=false; --mfcccoeffs=13; --mfsc=true; --minisz=0; --minrate=3; --minsil=0; --mintsz=0; --momentum=0.5; --netoptim=sgd; --noresample=false; --nthread=10; --nthread_decoder=1; --nthread_decoder_am_forward=1; --numattnhead=8; --onorm=target; --optimepsilon=1e-08; --optimrho=0.90000000000000002; --outputbinsize=5; --pctteacherforcing=100; --pcttraineval=100; --pow=false; --pretrainWindow=0; --replabel=0; --reportiters=0; --rightWindowSize=50; --rndv_filepath=/checkpoint/qiantong/ls_200M/do0.15_l5.6.10_mid3.0_incDO/100_rndv; --rundir=/checkpoint/qiantong/ls_200M/do0.15_l5.6.10_mid3.0_incDO; --runname=100; --samplerate=16000; --sampletarget=0; --samplingstrategy=rand; --saug_fmaskf=27; --saug_fmaskn=2; --saug_start_update=-1; --saug_tmaskn=2; --saug_tmaskp=1; --saug_tmaskt=100; --sclite=; --seed=2; --show=true; --showletters=false; --silscore=0; --smearing=max; --smoothingtemperature=1; --softwoffset=10; --softwrate=5; --softwstd=5; --sqnorm=true; --stepsize=200; --surround=; --tag=; --target=ltr; --test=/root/w2l/lists/new-test-clean.lst; --tokens=librispeech-train-all-unigram-10000.tokens; --tokensdir=/root/w2l/am; --train=/checkpoint/antares/datasets/librispeech/lists/train-clean-100.lst,/checkpoint/antares/datasets/librispeech/lists/train-clean-360.lst,/checkpoint/antares/datasets/librispeech/lists/train-other-500.lst; --trainWithWindow=false; --transdiag=0; --unkscore=0; --usememcache=false; --uselexicon=true; --usewordpiece=true; --valid=librispeech/dev-other:/checkpoint/antares/datasets/librispeech/lists/dev-other.lst,librispeech/dev-clean:/checkpoint/antares/datasets/librispeech/lists/dev-clean.lst; --validbatchsize=-1; --warmup=1; --weightdecay=0; --wordscore=0; --wordseparator=; --world_rank=0; --world_size=64; --alsologtoemail=; --alsologtostderr=false; --colorlogtostderr=false; --drop_log_memory=true; --log_backtrace_at=; --log_dir=; --log_link=; --log_prefix=true; --logbuflevel=0; --logbufsecs=30; --logemaillevel=999; --logfile_mode=436; --logmailer=/bin/mail; --logtostderr=true; --max_log_size=1800; --minloglevel=0; --stderrthreshold=2; --stop_logging_if_full_disk=false; --symbolize_stacktrace=true; --v=0; --vmodule=; I0808 21:13:20.911502 1034 Decode.cpp:127] Number of classes (network): 9998 I0808 21:13:21.963410 1034 Decode.cpp:134] Number of words: 200001 I0808 21:13:22.048540 1034 Decode.cpp:221] Here Loading the LM will be faster if you build a binary file. Reading /root/w2l/decoder/lm_librispeech_convlm_word_14B.bin ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100 terminate called after throwing an instance of 'util::EndOfFileException' what(): End of file Byte: 0 Aborted at 1596921202 (unix time) try "date -d @1596921202" if you are using GNU date PC: @ 0x7f4b77689e97 gsignal SIGABRT (@0x40a) received by PID 1034 (TID 0x7f4bbcf59380) from PID 1034; stack trace: @ 0x7f4bb526f890 (unknown) @ 0x7f4b77689e97 gsignal @ 0x7f4b7768b801 abort @ 0x7f4b7807e957 (unknown) @ 0x7f4b78084ab6 (unknown) @ 0x7f4b78084af1 std::terminate() @ 0x7f4b78084d79 __cxa_rethrow @ 0x5625a8d4bf84 lm::ngram::detail::GenericModel<>::InitializeFromARPA() @ 0x5625a8d4ddc5 lm::ngram::detail::GenericModel<>::GenericModel() @ 0x5625a8d44fab lm::ngram::LoadVirtual() @ 0x5625a8c6a649 w2l::KenLM::KenLM() @ 0x5625a8b14480 main @ 0x7f4b7766cb97 __libc_start_main @ 0x5625a8b76f3a _start Aborted (core dumped)
Why you are setting convlm bin model to the lm path while you are using kenlm? Please be care when provides flags for your run.
Thank @tlikhomanenko for the continuous help! With the above changes, for the convlm model, I got:
root@0614c5da94b4:~# wav2letter/build/Decoder \
--flagsfile wav2letter/build/test_decode.cfg \ --minloglevel 0 \ --logtostderr 1 \ --lmweight 1 \ --wordscore 0 \ --eosscore 0 \ --silscore 0 \ --unkscore 0 \ --smearing max I0808 20:32:21.031849 1014 Decode.cpp:58] Reading flags from file wav2letter/build/test_decode.cfg I0808 20:32:21.032115 1014 Decode.cpp:106] Gflags after parsing --flagfile=; --fromenv=; --tryfromenv=; --undefok=; --tab_completion_columns=80; --tab_completion_word=; --help=false; --helpfull=false; --helpmatch=; --helpon=; --helppackage=false; --helpshort=false; --helpxml=false; --version=false; --adambeta1=0.90000000000000002; --adambeta2=0.999; --am=; --am_decoder_tr_dropout=0; --am_decoder_tr_layerdrop=0; --am_decoder_tr_layers=1; --arch=default; --archdir=; --attention=content; --attentionthreshold=2147483647; --attnWindow=no; --attnconvchannel=0; --attnconvkernel=0; --attndim=0; --batchsize=1; --beamsize=100; --beamsizetoken=100; --beamthreshold=20; --blobdata=false; --channels=1; --criterion=asg; --critoptim=sgd; --datadir=; --dataorder=input; --decoderattnround=1; --decoderdropout=0; --decoderrnnlayer=1; --decodertype=wrd; --devwin=0; --emission_dir=/root/w2l/pre-trained_acoustic_models; --emission_queue_size=3000; --enable_distributed=false; --encoderdim=0; --eosscore=0; --eostoken=false; --everstoredb=false; --fftcachesize=1; --filterbanks=40; --flagsfile=wav2letter/build/test_decode.cfg; --framesizems=25; --framestridems=10; --gamma=1; --gumbeltemperature=1; --input=flac; --inputbinsize=100; --inputfeeding=false; --isbeamdump=false; --iter=9223372036854775807; --itersave=false; --labelsmooth=0; --leftWindowSize=50; --lexicon=/root/w2l/decoder/decoder-unigram-10000-nbest10.lexicon; --linlr=-1; --linlrcrit=-1; --linseg=0; --lm=/root/w2l/decoder/lm_librispeech_convlm_word_14B.bin; --lm_memory=5000; --lm_vocab=/root/w2l/decoder/lm_librispeech_convlm_14B.vocab; --lmtype=convlm; --lmweight=1; --localnrmlleftctx=0; --localnrmlrightctx=0; --logadd=false; --lr=1; --lr_decay=9223372036854775807; --lr_decay_step=9223372036854775807; --lrcosine=false; --lrcrit=0; --max_devices_per_node=8; --maxdecoderoutputlen=200; --maxgradnorm=0; --maxisz=9223372036854775807; --maxload=-1; --maxrate=10; --maxsil=50; --maxtsz=9223372036854775807; --maxword=-1; --melfloor=1; --memstepsize=10485760; --mfcc=false; --mfcccoeffs=13; --mfsc=false; --minisz=0; --minrate=3; --minsil=0; --mintsz=0; --momentum=0; --netoptim=sgd; --noresample=false; --nthread=1; --nthread_decoder=1; --nthread_decoder_am_forward=1; --numattnhead=8; --onorm=none; --optimepsilon=1e-08; --optimrho=0.90000000000000002; --outputbinsize=5; --pctteacherforcing=100; --pcttraineval=100; --pow=false; --pretrainWindow=0; --replabel=0; --reportiters=0; --rightWindowSize=50; --rndv_filepath=; --rundir=; --runname=; --samplerate=16000; --sampletarget=0; --samplingstrategy=rand; --saug_fmaskf=27; --saug_fmaskn=2; --saug_start_update=-1; --saug_tmaskn=2; --saug_tmaskp=1; --saug_tmaskt=100; --sclite=; --seed=0; --show=true; --showletters=false; --silscore=0; --smearing=max; --smoothingtemperature=1; --softwoffset=10; --softwrate=5; --softwstd=5; --sqnorm=false; --stepsize=9223372036854775807; --surround=; --tag=; --target=tkn; --test=/root/w2l/lists/new-test-clean.lst; --tokens=librispeech-train-all-unigram-10000.tokens; --tokensdir=/root/w2l/am; --train=; --trainWithWindow=false; --transdiag=0; --unkscore=0; --use_memcache=false; --uselexicon=true; --usewordpiece=false; --valid=; --validbatchsize=-1; --warmup=1; --weightdecay=0; --wordscore=0; --wordseparator=|; --world_rank=0; --world_size=1; --alsologtoemail=; --alsologtostderr=false; --colorlogtostderr=false; --drop_log_memory=true; --log_backtrace_at=; --log_dir=; --log_link=; --log_prefix=true; --logbuflevel=0; --logbufsecs=30; --logemaillevel=999; --logfile_mode=436; --logmailer=/bin/mail; --logtostderr=true; --max_log_size=1800; --minloglevel=0; --stderrthreshold=2; --stop_logging_if_full_disk=false; --symbolize_stacktrace=true; --v=0; --vmodule=; I0808 20:32:21.035308 1014 Decode.cpp:127] Number of classes (network): 9997 I0808 20:32:22.087042 1014 Decode.cpp:134] Number of words: 200001 Aborted at 1596918742 (unix time) try "date -d @1596918742" if you are using GNU date PC: @ 0x557fd9df29c8 (unknown) SIGSEGV (@0x8) received by PID 1014 (TID 0x7f3d357a2380) from PID 8; stack trace: @ 0x7f3d2dab8890 (unknown) @ 0x557fd9df29c8 (unknown) @ 0x557fd9b0ab93 (unknown) @ 0x7f3cefeb5b97 __libc_start_main @ 0x557fd9b6ef3a (unknown) Segmentation fault (core dumped)
Seems you didn't recompile. Just to be sure, add also LOG(INFO) << "Here begin";
on the row https://github.com/facebookresearch/wav2letter/blob/master/Decode.cpp#L136, compile and run again.
I added those three statements to Decode.cpp, and the build went through:
... [ 87%] Built target inference_IdentityTest [ 87%] Built target inference_Conv1dTest [ 88%] Built target inference_LayerNormTest [ 90%] Built target inference_LinearTest [ 91%] Built target audio_to_words_example [ 93%] Built target inference_TDSBlockTest [ 94%] Built target multithreaded_streaming_asr_example [ 96%] Built target interactive_streaming_asr_example [ 96%] Built target simple_streaming_asr_example [ 97%] Built target wav2letter-inference [ 97%] Linking CXX executable Decoder [100%] Built target Decoder
For the convlm model, I got:
root@73b1494e425d:~# wav2letter/build/Decoder \ > --flagsfile wav2letter/build/test_decode.cfg \ > --minloglevel 0 \ > --logtostderr 1 \ > --lmweight 1 \ > --wordscore 0 \ > --eosscore 0 \ > --silscore 0 \ > --unkscore 0 \ > --smearing max I0816 23:30:44.348644 2278 Decode.cpp:58] Reading flags from file wav2letter/build/test_decode.cfg I0816 23:30:44.507551 2278 Decode.cpp:75] [Network] Reading acoustic model from /root/w2l/pre-trained_acoustic_models/am_tds_ctc_librispeech_dev_clean.bin I0816 23:33:30.168771 2278 Decode.cpp:79] [Network] Sequential [input -> (0) -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> (12) -> (13) -> (14) -> (15) -> (16) -> (17) -> (18) -> (19) -> (20) -> (21) -> (22) -> (23) -> (24) -> (25) -> (26) -> (27) -> (28) -> (29) -> (30) -> (31) -> (32) -> (33) -> (34) -> (35) -> (36) -> (37) -> output] (0): SpecAugment ( W: 80, F: 27, mF: 2, T: 100, p: 1, mT: 2 ) (1): View (-1 80 1 0) (2): Conv2D (1->10, 21x1, 2,1, SAME,SAME, 1, 1) (with bias) (3): ReLU (4): Dropout (0.000000) (5): LayerNorm ( axis : { 0 1 2 } , size : -1) (6): Time-Depth Separable Block (21, 240, 10) [800 -> 2400 -> 800] (7): Time-Depth Separable Block (21, 240, 10) [800 -> 2400 -> 800] (8): Time-Depth Separable Block (21, 240, 10) [800 -> 2400 -> 800] (9): Time-Depth Separable Block (21, 240, 10) [800 -> 2400 -> 800] (10): Time-Depth Separable Block (21, 240, 10) [800 -> 2400 -> 800] (11): Conv2D (10->14, 21x1, 2,1, SAME,SAME, 1, 1) (with bias) (12): ReLU (13): Dropout (0.000000) (14): LayerNorm ( axis : { 0 1 2 } , size : -1) (15): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120] (16): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120] (17): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120] (18): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120] (19): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120] (20): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120] (21): Conv2D (14->18, 21x1, 2,1, SAME,SAME, 1, 1) (with bias) (22): ReLU (23): Dropout (0.000000) (24): LayerNorm ( axis : { 0 1 2 } , size : -1) (25): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (26): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (27): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (28): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (29): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (30): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (31): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (32): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (33): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (34): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (35): View (0 1440 1 0) (36): Reorder (1,0,3,2) (37): Linear (1440->9998) (with bias) I0816 23:33:30.169116 2278 Decode.cpp:82] [Criterion] ConnectionistTemporalClassificationCriterion I0816 23:33:30.169129 2278 Decode.cpp:84] [Network] Number of params: 203394122 I0816 23:33:30.169189 2278 Decode.cpp:90] [Network] Updating flags from config file: /root/w2l/pre-trained_acoustic_models/am_tds_ctc_librispeech_dev_clean.bin I0816 23:33:30.170029 2278 Decode.cpp:106] Gflags after parsing --flagfile=; --fromenv=; --tryfromenv=; --undefok=; --tab_completion_columns=80; --tab_completion_word=; --help=false; --helpfull=false; --helpmatch=; --helpon=; --helppackage=false; --helpshort=false; --helpxml=false; --version=false; --adambeta1=0.90000000000000002; --adambeta2=0.999; --am=/root/w2l/pre-trained_acoustic_models/am_tds_ctc_librispeech_dev_clean.bin; --am_decoder_tr_dropout=0; --am_decoder_tr_layerdrop=0; --am_decoder_tr_layers=1; --arch=tds_do0.15_l5.6.10_mid3.0_ctc.arch2; --archdir=/private/home/qiantong/push_numbers/200M/do0.15_l5.6.10_mid3.0_incDO; --attention=content; --attentionthreshold=0; --attnWindow=no; --attnconvchannel=0; --attnconvkernel=0; --attndim=0; --batchsize=4; --beamsize=100; --beamsizetoken=100; --beamthreshold=20; --blobdata=false; --channels=1; --criterion=ctc; --critoptim=sgd; --datadir=; --dataorder=input; --decoderattnround=1; --decoderdropout=0; --decoderrnnlayer=1; --decodertype=wrd; --devwin=0; --emission_dir=/root/w2l/pre-trained_acoustic_models; --emission_queue_size=3000; --enable_distributed=true; --encoderdim=0; --eosscore=0; --eostoken=false; --everstoredb=false; --fftcachesize=1; --filterbanks=80; --flagsfile=wav2letter/build/test_decode.cfg; --framesizems=30; --framestridems=10; --gamma=0.5; --gumbeltemperature=1; --input=flac; --inputbinsize=100; --inputfeeding=false; --isbeamdump=false; --iter=1500; --itersave=false; --labelsmooth=0; --leftWindowSize=50; --lexicon=/root/w2l/decoder/decoder-unigram-10000-nbest10.lexicon; --linlr=-1; --linlrcrit=-1; --linseg=0; --lm=/root/w2l/decoder/lm_librispeech_convlm_word_14B.bin; --lm_memory=5000; --lm_vocab=/root/w2l/decoder/lm_librispeech_convlm_14B.vocab; --lmtype=convlm; --lmweight=1; --localnrmlleftctx=0; --localnrmlrightctx=0; --logadd=false; --lr=0.29999999999999999; --lr_decay=9223372036854775807; --lr_decay_step=9223372036854775807; --lrcosine=false; --lrcrit=0; --max_devices_per_node=8; --maxdecoderoutputlen=200; --maxgradnorm=1; --maxisz=9223372036854775807; --maxload=-1; --maxrate=10; --maxsil=50; --maxtsz=9223372036854775807; --maxword=-1; --melfloor=1; --memstepsize=8338608; --mfcc=false; --mfcccoeffs=13; --mfsc=true; --minisz=0; --minrate=3; --minsil=0; --mintsz=0; --momentum=0.5; --netoptim=sgd; --noresample=false; --nthread=10; --nthread_decoder=1; --nthread_decoder_am_forward=1; --numattnhead=8; --onorm=target; --optimepsilon=1e-08; --optimrho=0.90000000000000002; --outputbinsize=5; --pctteacherforcing=100; --pcttraineval=100; --pow=false; --pretrainWindow=0; --replabel=0; --reportiters=0; --rightWindowSize=50; --rndv_filepath=/checkpoint/qiantong/ls_200M/do0.15_l5.6.10_mid3.0_incDO/100_rndv; --rundir=/checkpoint/qiantong/ls_200M/do0.15_l5.6.10_mid3.0_incDO; --runname=100; --samplerate=16000; --sampletarget=0; --samplingstrategy=rand; --saug_fmaskf=27; --saug_fmaskn=2; --saug_start_update=-1; --saug_tmaskn=2; --saug_tmaskp=1; --saug_tmaskt=100; --sclite=; --seed=2; --show=true; --showletters=false; --silscore=0; --smearing=max; --smoothingtemperature=1; --softwoffset=10; --softwrate=5; --softwstd=5; --sqnorm=true; --stepsize=200; --surround=; --tag=; --target=ltr; --test=/root/w2l/lists/new-test-clean.lst; --tokens=librispeech-train-all-unigram-10000.tokens; --tokensdir=/root/w2l/am; --train=/checkpoint/antares/datasets/librispeech/lists/train-clean-100.lst,/checkpoint/antares/datasets/librispeech/lists/train-clean-360.lst,/checkpoint/antares/datasets/librispeech/lists/train-other-500.lst; --trainWithWindow=false; --transdiag=0; --unkscore=0; --use_memcache=false; --uselexicon=true; --usewordpiece=true; --valid=librispeech/dev-other:/checkpoint/antares/datasets/librispeech/lists/dev-other.lst,librispeech/dev-clean:/checkpoint/antares/datasets/librispeech/lists/dev-clean.lst; --validbatchsize=-1; --warmup=1; --weightdecay=0; --wordscore=0; --wordseparator=_; --world_rank=0; --world_size=64; --alsologtoemail=; --alsologtostderr=false; --colorlogtostderr=false; --drop_log_memory=true; --log_backtrace_at=; --log_dir=; --log_link=; --log_prefix=true; --logbuflevel=0; --logbufsecs=30; --logemaillevel=999; --logfile_mode=436; --logmailer=/bin/mail; --logtostderr=true; --max_log_size=1800; --minloglevel=0; --stderrthreshold=2; --stop_logging_if_full_disk=false; --symbolize_stacktrace=true; --v=0; --vmodule=; I0816 23:33:30.869755 2278 Decode.cpp:127] Number of classes (network): 9998 I0816 23:33:51.396348 2278 Decode.cpp:134] Number of words: 200001 I0816 23:33:51.396366 2278 Decode.cpp:136] Here begin I0816 23:33:51.493278 2278 Decode.cpp:221] Here I0816 23:33:51.493304 2278 Decode.cpp:230] Init device I0816 23:33:51.493319 2278 Decode.cpp:232] [ConvLM]: Loading LM from /root/w2l/decoder/lm_librispeech_convlm_word_14B.bin E0816 23:33:52.454282 2278 Serial.h:77] Error while loading "/root/w2l/decoder/lm_librispeech_convlm_word_14B.bin": Failed to read 8 bytes from input stream! Read 0 E0816 23:33:53.454519 2278 Serial.h:77] Error while loading "/root/w2l/decoder/lm_librispeech_convlm_word_14B.bin": Failed to read 8 bytes from input stream! Read 0 E0816 23:33:55.454686 2278 Serial.h:77] Error while loading "/root/w2l/decoder/lm_librispeech_convlm_word_14B.bin": Failed to read 8 bytes from input stream! Read 0 E0816 23:33:59.454841 2278 Serial.h:77] Error while loading "/root/w2l/decoder/lm_librispeech_convlm_word_14B.bin": Failed to read 8 bytes from input stream! Read 0 E0816 23:34:07.455020 2278 Serial.h:77] Error while loading "/root/w2l/decoder/lm_librispeech_convlm_word_14B.bin": Failed to read 8 bytes from input stream! Read 0 E0816 23:34:23.455188 2278 Serial.h:77] Error while loading "/root/w2l/decoder/lm_librispeech_convlm_word_14B.bin": Failed to read 8 bytes from input stream! Read 0 terminate called after throwing an instance of 'cereal::Exception' what(): Failed to read 8 bytes from input stream! Read 0 *** Aborted at 1597620863 (unix time) try "date -d @1597620863" if you are using GNU date *** PC: @ 0x7f6daa0f4e97 gsignal *** SIGABRT (@0x8e6) received by PID 2278 (TID 0x7f6def9c4380) from PID 2278; stack trace: *** @ 0x7f6de7cda890 (unknown) @ 0x7f6daa0f4e97 gsignal @ 0x7f6daa0f6801 abort @ 0x7f6daaae9957 (unknown) @ 0x7f6daaaefab6 (unknown) @ 0x7f6daaaefaf1 std::terminate() @ 0x7f6daaaefd79 __cxa_rethrow @ 0x55701c0dc260 _ZN3w2l16retryWithBackoffIRFvRKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEERSt10shared_ptrIN2fl6ModuleEEEJS8_SD_EEENSt9result_ofIFT_DpT0_EE4typeENSt6chrono8durationIdSt5ratioILl1ELl1EEEEdlOSH_DpOSI_ @ 0x55701c05c245 main @ 0x7f6daa0d7b97 __libc_start_main @ 0x55701c0c000a _start Aborted (core dumped)
Why did you previously have another error when using convlm? It wasn't cereal error.
Good question! Yesterday, I ran the docker image with CUDA in a new container, and got in the container, then ran Train, Test, and Decode by coping and pasting the code from my record into the container. Actually, I did the same process quite a few times, and each time the bug for Decode stayed the same except that yesterday it produced a cereal bug instead. For the most two recent executions of Decode for the convlm model (i.e., https://github.com/facebookresearch/wav2letter/issues/711#issuecomment-670971073 and https://github.com/facebookresearch/wav2letter/issues/711#issuecomment-674593031), I added --lm_vocab and changed --nthread_decoder to 1 in the Decode config file per the suggestion of @tlikhomanenko.
Anyway to get rid of the cereal bug? Thank you!!
I have no idea what are you doing/changing. We didn't change anything that could affect the reading convlm model bin, so you should not have a cereal bug.
It sounds that something is wrong with my file: lm_librispeech_convlm_word_14B.bin. So, I re-downloaded it, and it fixed the cereal error. Then, I got:
root@5861d058cec8:~# wav2letter/build/Decoder \ > --flagsfile wav2letter/build/test_decode.cfg \ > --minloglevel 0 \ > --logtostderr 1 \ > --lmweight 1 \ > --wordscore 0 \ > --eosscore 0 \ > --silscore 0 \ > --unkscore 0 \ > --smearing max I0823 22:16:34.447176 1083 Decode.cpp:58] Reading flags from file wav2letter/build/test_decode.cfg I0823 22:16:34.447362 1083 Decode.cpp:75] [Network] Reading acoustic model from /root/w2l/pre-trained_acoustic_models/am_tds_ctc_librispeech_dev_clean.bin I0823 22:16:35.110081 1083 Decode.cpp:79] [Network] Sequential [input -> (0) -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> (12) -> (13) -> (14) -> (15) -> (16) -> (17) -> (18) -> (19) -> (20) -> (21) -> (22) -> (23) -> (24) -> (25) -> (26) -> (27) -> (28) -> (29) -> (30) -> (31) -> (32) -> (33) -> (34) -> (35) -> (36) -> (37) -> output] (0): SpecAugment ( W: 80, F: 27, mF: 2, T: 100, p: 1, mT: 2 ) (1): View (-1 80 1 0) (2): Conv2D (1->10, 21x1, 2,1, SAME,SAME, 1, 1) (with bias) (3): ReLU (4): Dropout (0.000000) (5): LayerNorm ( axis : { 0 1 2 } , size : -1) (6): Time-Depth Separable Block (21, 240, 10) [800 -> 2400 -> 800] (7): Time-Depth Separable Block (21, 240, 10) [800 -> 2400 -> 800] (8): Time-Depth Separable Block (21, 240, 10) [800 -> 2400 -> 800] (9): Time-Depth Separable Block (21, 240, 10) [800 -> 2400 -> 800] (10): Time-Depth Separable Block (21, 240, 10) [800 -> 2400 -> 800] (11): Conv2D (10->14, 21x1, 2,1, SAME,SAME, 1, 1) (with bias) (12): ReLU (13): Dropout (0.000000) (14): LayerNorm ( axis : { 0 1 2 } , size : -1) (15): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120] (16): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120] (17): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120] (18): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120] (19): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120] (20): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120] (21): Conv2D (14->18, 21x1, 2,1, SAME,SAME, 1, 1) (with bias) (22): ReLU (23): Dropout (0.000000) (24): LayerNorm ( axis : { 0 1 2 } , size : -1) (25): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (26): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (27): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (28): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (29): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (30): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (31): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (32): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (33): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (34): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (35): View (0 1440 1 0) (36): Reorder (1,0,3,2) (37): Linear (1440->9998) (with bias) I0823 22:16:35.110160 1083 Decode.cpp:82] [Criterion] ConnectionistTemporalClassificationCriterion I0823 22:16:35.110162 1083 Decode.cpp:84] [Network] Number of params: 203394122 I0823 22:16:35.110174 1083 Decode.cpp:90] [Network] Updating flags from config file: /root/w2l/pre-trained_acoustic_models/am_tds_ctc_librispeech_dev_clean.bin I0823 22:16:35.110532 1083 Decode.cpp:106] Gflags after parsing --flagfile=; --fromenv=; --tryfromenv=; --undefok=; --tab_completion_columns=80; --tab_completion_word=; --help=false; --helpfull=false; --helpmatch=; --helpon=; --helppackage=false; --helpshort=false; --helpxml=false; --version=false; --adambeta1=0.90000000000000002; --adambeta2=0.999; --am=/root/w2l/pre-trained_acoustic_models/am_tds_ctc_librispeech_dev_clean.bin; --am_decoder_tr_dropout=0; --am_decoder_tr_layerdrop=0; --am_decoder_tr_layers=1; --arch=tds_do0.15_l5.6.10_mid3.0_ctc.arch2; --archdir=/private/home/qiantong/push_numbers/200M/do0.15_l5.6.10_mid3.0_incDO; --attention=content; --attentionthreshold=0; --attnWindow=no; --attnconvchannel=0; --attnconvkernel=0; --attndim=0; --batchsize=4; --beamsize=100; --beamsizetoken=100; --beamthreshold=20; --blobdata=false; --channels=1; --criterion=ctc; --critoptim=sgd; --datadir=; --dataorder=input; --decoderattnround=1; --decoderdropout=0; --decoderrnnlayer=1; --decodertype=wrd; --devwin=0; --emission_dir=/root/w2l/pre-trained_acoustic_models; --emission_queue_size=3000; --enable_distributed=true; --encoderdim=0; --eosscore=0; --eostoken=false; --everstoredb=false; --fftcachesize=1; --filterbanks=80; --flagsfile=wav2letter/build/test_decode.cfg; --framesizems=30; --framestridems=10; --gamma=0.5; --gumbeltemperature=1; --input=flac; --inputbinsize=100; --inputfeeding=false; --isbeamdump=false; --iter=1500; --itersave=false; --labelsmooth=0; --leftWindowSize=50; --lexicon=/root/w2l/decoder/decoder-unigram-10000-nbest10.lexicon; --linlr=-1; --linlrcrit=-1; --linseg=0; --lm=/root/w2l/decoder/lm_librispeech_convlm_word_14B.bin; --lm_memory=5000; --lm_vocab=/root/w2l/decoder/lm_librispeech_convlm_word_14B.vocab; --lmtype=convlm; --lmweight=1; --localnrmlleftctx=0; --localnrmlrightctx=0; --logadd=false; --lr=0.29999999999999999; --lr_decay=9223372036854775807; --lr_decay_step=9223372036854775807; --lrcosine=false; --lrcrit=0; --max_devices_per_node=8; --maxdecoderoutputlen=200; --maxgradnorm=1; --maxisz=9223372036854775807; --maxload=-1; --maxrate=10; --maxsil=50; --maxtsz=9223372036854775807; --maxword=-1; --melfloor=1; --memstepsize=8338608; --mfcc=false; --mfcccoeffs=13; --mfsc=true; --minisz=0; --minrate=3; --minsil=0; --mintsz=0; --momentum=0.5; --netoptim=sgd; --noresample=false; --nthread=10; --nthread_decoder=1; --nthread_decoder_am_forward=1; --numattnhead=8; --onorm=target; --optimepsilon=1e-08; --optimrho=0.90000000000000002; --outputbinsize=5; --pctteacherforcing=100; --pcttraineval=100; --pow=false; --pretrainWindow=0; --replabel=0; --reportiters=0; --rightWindowSize=50; --rndv_filepath=/checkpoint/qiantong/ls_200M/do0.15_l5.6.10_mid3.0_incDO/100_rndv; --rundir=/checkpoint/qiantong/ls_200M/do0.15_l5.6.10_mid3.0_incDO; --runname=100; --samplerate=16000; --sampletarget=0; --samplingstrategy=rand; --saug_fmaskf=27; --saug_fmaskn=2; --saug_start_update=-1; --saug_tmaskn=2; --saug_tmaskp=1; --saug_tmaskt=100; --sclite=; --seed=2; --show=true; --showletters=false; --silscore=0; --smearing=max; --smoothingtemperature=1; --softwoffset=10; --softwrate=5; --softwstd=5; --sqnorm=true; --stepsize=200; --surround=; --tag=; --target=ltr; --test=/root/w2l/lists/new-test-clean.lst; --tokens=librispeech-train-all-unigram-10000.tokens; --tokensdir=/root/w2l/am; --train=/checkpoint/antares/datasets/librispeech/lists/train-clean-100.lst,/checkpoint/antares/datasets/librispeech/lists/train-clean-360.lst,/checkpoint/antares/datasets/librispeech/lists/train-other-500.lst; --trainWithWindow=false; --transdiag=0; --unkscore=0; --use_memcache=false; --uselexicon=true; --usewordpiece=true; --valid=librispeech/dev-other:/checkpoint/antares/datasets/librispeech/lists/dev-other.lst,librispeech/dev-clean:/checkpoint/antares/datasets/librispeech/lists/dev-clean.lst; --validbatchsize=-1; --warmup=1; --weightdecay=0; --wordscore=0; --wordseparator=_; --world_rank=0; --world_size=64; --alsologtoemail=; --alsologtostderr=false; --colorlogtostderr=false; --drop_log_memory=true; --log_backtrace_at=; --log_dir=; --log_link=; --log_prefix=true; --logbuflevel=0; --logbufsecs=30; --logemaillevel=999; --logfile_mode=436; --logmailer=/bin/mail; --logtostderr=true; --max_log_size=1800; --minloglevel=0; --stderrthreshold=2; --stop_logging_if_full_disk=false; --symbolize_stacktrace=true; --v=0; --vmodule=; I0823 22:16:35.113325 1083 Decode.cpp:127] Number of classes (network): 9998 I0823 22:16:36.147918 1083 Decode.cpp:134] Number of words: 200001 I0823 22:16:36.147938 1083 Decode.cpp:136] Here begin I0823 22:16:36.228821 1083 Decode.cpp:221] Here I0823 22:16:36.228848 1083 Decode.cpp:230] Init device I0823 22:16:36.228873 1083 Decode.cpp:232] [ConvLM]: Loading LM from /root/w2l/decoder/lm_librispeech_convlm_word_14B.bin [ConvLM]: Loading vocabulary from /root/w2l/decoder/lm_librispeech_convlm_word_14B.vocab [ConvLM]: vocabulary size of convLM 221452 I0823 22:16:39.344952 1083 Decode.cpp:248] [Decoder] LM constructed. terminate called after throwing an instance of 'af::exception' what(): ArrayFire Exception (Device out of memory:101): In function virtual void* cuda::Allocator::nativeAlloc(size_t) In file src/backend/cuda/memory.cpp:152 CUDA Error (2): out of memory 0# 0x00007F47B6291B64 in /usr/local/lib/libafcuda.so.3 1# 0x00007F47B6CEB467 in /usr/local/lib/libafcuda.so.3 2# 0x00007F47B6292D06 in /usr/local/lib/libafcuda.so.3 3# 0x00007F47B5D06FD5 in /usr/local/lib/libafcuda.so.3 4# 0x00007F47B5D079A8 in /usr/local/lib/libafcuda.so.3 5# af_get_device_ptr in /usr/local/lib/libafcuda.so.3 6# void* af::array::device<void>() const in /usr/local/lib/libafcuda.so.3 7# 0x00005636348F4F3C in wav2letter/build/Decoder 8# 0x000056363496FEED in wav2letter/build/Decoder 9# 0x000056363495B82F in wav2letter/build/Decoder 10# 0x000056363492C4FE in wav2letter/build/Decoder 11# 0x0000563634941205 in wav2letter/build/Decoder 12# 0x0000563634962F21 in wav2letter/build/Decoder 13# 0x000056363496307D in wav2letter/build/Decoder 14# 0x0000563634913C5A in wav2letter/build/Decoder 15# 0x000056363483E293 *** Aborted at 1598221003 (unix time) try "date -d @1598221003" if you are using GNU date *** PC: @ 0x7f4796035e97 gsignal *** SIGABRT (@0x43b) received by PID 1083 (TID 0x7f47db905380) from PID 1083; stack trace: *** @ 0x7f47d3c1b890 (unknown) @ 0x7f4796035e97 gsignal @ 0x7f4796037801 abort @ 0x7f4796a2a957 (unknown) @ 0x7f4796a30ab6 (unknown) @ 0x7f4796a30af1 std::terminate() @ 0x7f4796a30d24 __cxa_throw @ 0x7f47b6c85728 af::array::device<>() @ 0x5636348f4f3c fl::DevicePtr::DevicePtr() @ 0x56363496feed fl::conv2d() @ 0x56363495b82f fl::AsymmetricConv1D::forward() @ 0x56363492c4fe fl::UnaryModule::forward() @ 0x563634941205 fl::WeightNorm::forward() @ 0x563634962f21 fl::Residual::forward() @ 0x56363496307d fl::Residual::forward() @ 0x563634913c5a fl::Sequential::forward() @ 0x56363483e293 _ZZN3w2l27buildGetConvLmScoreFunctionESt10shared_ptrIN2fl6ModuleEEENKUlRKSt6vectorIiSaIiEES8_iiE_clES8_S8_ii @ 0x56363483ef16 _ZNSt17_Function_handlerIFSt6vectorIfSaIfEERKS0_IiSaIiEES6_iiEZN3w2l27buildGetConvLmScoreFunctionESt10shared_ptrIN2fl6ModuleEEEUlS6_S6_iiE_E9_M_invokeERKSt9_Any_dataS6_S6_OiSI_ @ 0x5636347a00bd w2l::ConvLM::scoreWithLmIdx() @ 0x5636347a0794 w2l::ConvLM::score() @ 0x563634646346 main @ 0x7f4796018b97 __libc_start_main @ 0x5636346a900a _start Aborted (core dumped)
Anyway to take care of the "out of memory" error? Thank you!!
It is working now with saved emissions, cool! Now you have OOM with convlm forward, to fix it try to reduce --lm_memory=5000
, try --lm_memory=2000
for example, if still OOM try to reduce more.
Thank @tlikhomanenko for the quick response! For reducing lm_memory, I have tried quite a few values: 2000, 1000, ..., and finally 1. Unfortunately, each time the OOM error is still there. The following is for --lm_memory=1:
root@5861d058cec8:~# wav2letter/build/Decoder --flagsfile wav2letter/build/test_decode.cfg --minloglevel 0 --logtostderr 1 --lmweight 1 --wordscore 0 --eosscore 0 --silscore 0 --unkscore 0 --smearing max I0823 23:05:53.120163 1114 Decode.cpp:58] Reading flags from file wav2letter/build/test_decode.cfg I0823 23:05:53.120270 1114 Decode.cpp:75] [Network] Reading acoustic model from /root/w2l/pre-trained_acoustic_models/am_tds_ctc_librispeech_dev_clean.bin I0823 23:05:53.764691 1114 Decode.cpp:79] [Network] Sequential [input -> (0) -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> (12) -> (13) -> (14) -> (15) -> (16) -> (17) -> (18) -> (19) -> (20) -> (21) -> (22) -> (23) -> (24) -> (25) -> (26) -> (27) -> (28) -> (29) -> (30) -> (31) -> (32) -> (33) -> (34) -> (35) -> (36) -> (37) -> output] (0): SpecAugment ( W: 80, F: 27, mF: 2, T: 100, p: 1, mT: 2 ) (1): View (-1 80 1 0) (2): Conv2D (1->10, 21x1, 2,1, SAME,SAME, 1, 1) (with bias) (3): ReLU (4): Dropout (0.000000) (5): LayerNorm ( axis : { 0 1 2 } , size : -1) (6): Time-Depth Separable Block (21, 240, 10) [800 -> 2400 -> 800] (7): Time-Depth Separable Block (21, 240, 10) [800 -> 2400 -> 800] (8): Time-Depth Separable Block (21, 240, 10) [800 -> 2400 -> 800] (9): Time-Depth Separable Block (21, 240, 10) [800 -> 2400 -> 800] (10): Time-Depth Separable Block (21, 240, 10) [800 -> 2400 -> 800] (11): Conv2D (10->14, 21x1, 2,1, SAME,SAME, 1, 1) (with bias) (12): ReLU (13): Dropout (0.000000) (14): LayerNorm ( axis : { 0 1 2 } , size : -1) (15): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120] (16): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120] (17): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120] (18): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120] (19): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120] (20): Time-Depth Separable Block (21, 240, 14) [1120 -> 3360 -> 1120] (21): Conv2D (14->18, 21x1, 2,1, SAME,SAME, 1, 1) (with bias) (22): ReLU (23): Dropout (0.000000) (24): LayerNorm ( axis : { 0 1 2 } , size : -1) (25): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (26): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (27): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (28): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (29): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (30): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (31): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (32): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (33): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (34): Time-Depth Separable Block (21, 240, 18) [1440 -> 4320 -> 1440] (35): View (0 1440 1 0) (36): Reorder (1,0,3,2) (37): Linear (1440->9998) (with bias) I0823 23:05:53.764780 1114 Decode.cpp:82] [Criterion] ConnectionistTemporalClassificationCriterion I0823 23:05:53.764782 1114 Decode.cpp:84] [Network] Number of params: 203394122 I0823 23:05:53.764794 1114 Decode.cpp:90] [Network] Updating flags from config file: /root/w2l/pre-trained_acoustic_models/am_tds_ctc_librispeech_dev_clean.bin I0823 23:05:53.765208 1114 Decode.cpp:106] Gflags after parsing --flagfile=; --fromenv=; --tryfromenv=; --undefok=; --tab_completion_columns=80; --tab_completion_word=; --help=false; --helpfull=false; --helpmatch=; --helpon=; --helppackage=false; --helpshort=false; --helpxml=false; --version=false; --adambeta1=0.90000000000000002; --adambeta2=0.999; --am=/root/w2l/pre-trained_acoustic_models/am_tds_ctc_librispeech_dev_clean.bin; --am_decoder_tr_dropout=0; --am_decoder_tr_layerdrop=0; --am_decoder_tr_layers=1; --arch=tds_do0.15_l5.6.10_mid3.0_ctc.arch2; --archdir=/private/home/qiantong/push_numbers/200M/do0.15_l5.6.10_mid3.0_incDO; --attention=content; --attentionthreshold=0; --attnWindow=no; --attnconvchannel=0; --attnconvkernel=0; --attndim=0; --batchsize=4; --beamsize=100; --beamsizetoken=100; --beamthreshold=20; --blobdata=false; --channels=1; --criterion=ctc; --critoptim=sgd; --datadir=; --dataorder=input; --decoderattnround=1; --decoderdropout=0; --decoderrnnlayer=1; --decodertype=wrd; --devwin=0; --emission_dir=/root/w2l/pre-trained_acoustic_models; --emission_queue_size=3000; --enable_distributed=true; --encoderdim=0; --eosscore=0; --eostoken=false; --everstoredb=false; --fftcachesize=1; --filterbanks=80; --flagsfile=wav2letter/build/test_decode.cfg; --framesizems=30; --framestridems=10; --gamma=0.5; --gumbeltemperature=1; --input=flac; --inputbinsize=100; --inputfeeding=false; --isbeamdump=false; --iter=1500; --itersave=false; --labelsmooth=0; --leftWindowSize=50; --lexicon=/root/w2l/decoder/decoder-unigram-10000-nbest10.lexicon; --linlr=-1; --linlrcrit=-1; --linseg=0; --lm=/root/w2l/decoder/lm_librispeech_convlm_word_14B.bin; --lm_memory=1; --lm_vocab=/root/w2l/decoder/lm_librispeech_convlm_word_14B.vocab; --lmtype=convlm; --lmweight=1; --localnrmlleftctx=0; --localnrmlrightctx=0; --logadd=false; --lr=0.29999999999999999; --lr_decay=9223372036854775807; --lr_decay_step=9223372036854775807; --lrcosine=false; --lrcrit=0; --max_devices_per_node=8; --maxdecoderoutputlen=200; --maxgradnorm=1; --maxisz=9223372036854775807; --maxload=-1; --maxrate=10; --maxsil=50; --maxtsz=9223372036854775807; --maxword=-1; --melfloor=1; --memstepsize=8338608; --mfcc=false; --mfcccoeffs=13; --mfsc=true; --minisz=0; --minrate=3; --minsil=0; --mintsz=0; --momentum=0.5; --netoptim=sgd; --noresample=false; --nthread=10; --nthread_decoder=1; --nthread_decoder_am_forward=1; --numattnhead=8; --onorm=target; --optimepsilon=1e-08; --optimrho=0.90000000000000002; --outputbinsize=5; --pctteacherforcing=100; --pcttraineval=100; --pow=false; --pretrainWindow=0; --replabel=0; --reportiters=0; --rightWindowSize=50; --rndv_filepath=/checkpoint/qiantong/ls_200M/do0.15_l5.6.10_mid3.0_incDO/100_rndv; --rundir=/checkpoint/qiantong/ls_200M/do0.15_l5.6.10_mid3.0_incDO; --runname=100; --samplerate=16000; --sampletarget=0; --samplingstrategy=rand; --saug_fmaskf=27; --saug_fmaskn=2; --saug_start_update=-1; --saug_tmaskn=2; --saug_tmaskp=1; --saug_tmaskt=100; --sclite=; --seed=2; --show=true; --showletters=false; --silscore=0; --smearing=max; --smoothingtemperature=1; --softwoffset=10; --softwrate=5; --softwstd=5; --sqnorm=true; --stepsize=200; --surround=; --tag=; --target=ltr; --test=/root/w2l/lists/new-test-clean.lst; --tokens=librispeech-train-all-unigram-10000.tokens; --tokensdir=/root/w2l/am; --train=/checkpoint/antares/datasets/librispeech/lists/train-clean-100.lst,/checkpoint/antares/datasets/librispeech/lists/train-clean-360.lst,/checkpoint/antares/datasets/librispeech/lists/train-other-500.lst; --trainWithWindow=false; --transdiag=0; --unkscore=0; --use_memcache=false; --uselexicon=true; --usewordpiece=true; --valid=librispeech/dev-other:/checkpoint/antares/datasets/librispeech/lists/dev-other.lst,librispeech/dev-clean:/checkpoint/antares/datasets/librispeech/lists/dev-clean.lst; --validbatchsize=-1; --warmup=1; --weightdecay=0; --wordscore=0; --wordseparator=_; --world_rank=0; --world_size=64; --alsologtoemail=; --alsologtostderr=false; --colorlogtostderr=false; --drop_log_memory=true; --log_backtrace_at=; --log_dir=; --log_link=; --log_prefix=true; --logbuflevel=0; --logbufsecs=30; --logemaillevel=999; --logfile_mode=436; --logmailer=/bin/mail; --logtostderr=true; --max_log_size=1800; --minloglevel=0; --stderrthreshold=2; --stop_logging_if_full_disk=false; --symbolize_stacktrace=true; --v=0; --vmodule=; I0823 23:05:53.768092 1114 Decode.cpp:127] Number of classes (network): 9998 I0823 23:05:54.727926 1114 Decode.cpp:134] Number of words: 200001 I0823 23:05:54.727947 1114 Decode.cpp:136] Here begin I0823 23:05:54.806488 1114 Decode.cpp:221] Here I0823 23:05:54.806516 1114 Decode.cpp:230] Init device I0823 23:05:54.806540 1114 Decode.cpp:232] [ConvLM]: Loading LM from /root/w2l/decoder/lm_librispeech_convlm_word_14B.bin [ConvLM]: Loading vocabulary from /root/w2l/decoder/lm_librispeech_convlm_word_14B.vocab [ConvLM]: vocabulary size of convLM 221452 I0823 23:05:57.741475 1114 Decode.cpp:248] [Decoder] LM constructed. terminate called after throwing an instance of 'af::exception' what(): ArrayFire Exception (Device out of memory:101): In function virtual void* cuda::Allocator::nativeAlloc(size_t) In file src/backend/cuda/memory.cpp:152 CUDA Error (2): out of memory 0# 0x00007FBC96942B64 in /usr/local/lib/libafcuda.so.3 1# 0x00007FBC9739C467 in /usr/local/lib/libafcuda.so.3 2# 0x00007FBC96943D06 in /usr/local/lib/libafcuda.so.3 3# 0x00007FBC963B7FD5 in /usr/local/lib/libafcuda.so.3 4# 0x00007FBC963B89A8 in /usr/local/lib/libafcuda.so.3 5# af_get_device_ptr in /usr/local/lib/libafcuda.so.3 6# void* af::array::device<void>() const in /usr/local/lib/libafcuda.so.3 7# 0x000055AB403E3F3C in wav2letter/build/Decoder 8# 0x000055AB4045EEED in wav2letter/build/Decoder 9# 0x000055AB4044A82F in wav2letter/build/Decoder 10# 0x000055AB4041B4FE in wav2letter/build/Decoder 11# 0x000055AB40430205 in wav2letter/build/Decoder 12# 0x000055AB40451F21 in wav2letter/build/Decoder 13# 0x000055AB4045207D in wav2letter/build/Decoder 14# 0x000055AB40402C5A in wav2letter/build/Decoder 15# 0x000055AB4032D293 *** Aborted at 1598223962 (unix time) try "date -d @1598223962" if you are using GNU date *** PC: @ 0x7fbc766e6e97 gsignal *** SIGABRT (@0x45a) received by PID 1114 (TID 0x7fbcbbfb6380) from PID 1114; stack trace: *** @ 0x7fbcb42cc890 (unknown) @ 0x7fbc766e6e97 gsignal @ 0x7fbc766e8801 abort @ 0x7fbc770db957 (unknown) @ 0x7fbc770e1ab6 (unknown) @ 0x7fbc770e1af1 std::terminate() @ 0x7fbc770e1d24 __cxa_throw @ 0x7fbc97336728 af::array::device<>() @ 0x55ab403e3f3c fl::DevicePtr::DevicePtr() @ 0x55ab4045eeed fl::conv2d() @ 0x55ab4044a82f fl::AsymmetricConv1D::forward() @ 0x55ab4041b4fe fl::UnaryModule::forward() @ 0x55ab40430205 fl::WeightNorm::forward() @ 0x55ab40451f21 fl::Residual::forward() @ 0x55ab4045207d fl::Residual::forward() @ 0x55ab40402c5a fl::Sequential::forward() @ 0x55ab4032d293 _ZZN3w2l27buildGetConvLmScoreFunctionESt10shared_ptrIN2fl6ModuleEEENKUlRKSt6vectorIiSaIiEES8_iiE_clES8_S8_ii @ 0x55ab4032df16 _ZNSt17_Function_handlerIFSt6vectorIfSaIfEERKS0_IiSaIiEES6_iiEZN3w2l27buildGetConvLmScoreFunctionESt10shared_ptrIN2fl6ModuleEEEUlS6_S6_iiE_E9_M_invokeERKSt9_Any_dataS6_S6_OiSI_ @ 0x55ab4028f0bd w2l::ConvLM::scoreWithLmIdx() @ 0x55ab4028f794 w2l::ConvLM::score() @ 0x55ab40135346 main @ 0x7fbc766c9b97 __libc_start_main @ 0x55ab4019800a _start Aborted (core dumped)
Any other way to avoid the OOM? Or, anybody has ever run though this on a computer with 1 GPU (4 GB)? Thank you again!
I used --iter=100000 instead of 10000000 to train the TDS CTC model. At decoding, I got
Any ideas to take care of this? Thank you!