malloc(): memory corruption error while decoding #450

Closed samin9796 closed 4 years ago

samin9796 commented 4 years ago

I get the following error:

./Decoder --flagsfile=/data/ahnaf/wav2letter/dataset_prep/decode.cfg I1129 06:16:10.697783 20154 Decode.cpp:112] Gflags after parsing --flagfile=; --fromenv=; --tryfromenv=; --undefok=; --tab_completion_columns=80; --tab_completion_word=; --help=false; --helpfull=false; --helpmatch=; --helpon=; --helppackage=false; --helpshort=false; --helpxml=false; --version=false; --adambeta1=0.90000000000000002; --adambeta2=0.999; --am=/data/ahnaf/wav2letter/dataset_prep/all_models/wer_29/sust_full2/001_model_validation.lst.bin; --arch=network.arch; --archdir=/data/ahnaf/wav2letter/dataset_prep/; --attention=content; --attentionthreshold=0; --attnWindow=no; --attnconvchannel=0; --attnconvkernel=0; --attndim=0; --batchsize=4; --beamsize=500; --beamthreshold=25; --blobdata=false; --channels=1; --criterion=ctc; --critoptim=sgd; --datadir=/data/ahnaf/wav2letter/dataset_prep/; --dataorder=input; --decoderattnround=1; --decoderdropout=0; --decoderrnnlayer=1; --decodertype=wrd; --devwin=0; --emission_dir=; --enable_distributed=true; --encoderdim=0; --eostoken=false; --everstoredb=false; --fftcachesize=1; --filterbanks=40; --flagsfile=/data/ahnaf/wav2letter/dataset_prep/decode.cfg; --framesizems=25; --framestridems=10; --gamma=1; --gumbeltemperature=1; --hardselection=1; --input=wav; --inputbinsize=100; --inputfeeding=false; --iter=500; --itersave=false; --labelsmooth=0; --leftWindowSize=50; --lexicon=/data/ahnaf/wav2letter/dataset_prep/lexicon.txt; --linlr=-1; --linlrcrit=-1; --linseg=0; --lm=/data/ahnaf/speech/warp-ctc/pytorch_binding/audio/ctcdecode/deepspeech.pytorch_new/language_models/bangla_5gram_lm.binary; --lm_memory=5000; --lm_vocab=; --lmtype=kenlm; --lmweight=2.5; --localnrmlleftctx=0; --localnrmlrightctx=0; --logadd=false; --lr=0.10000000000000001; --lrcosine=false; --lrcrit=0.0060000000000000001; --maxdecoderoutputlen=200; --maxgradnorm=0.20000000000000001; --maxisz=9223372036854775807; --maxload=-1; --maxrate=10; --maxsil=50; --maxtsz=9223372036854775807; --maxword=-1; --melfloor=1; --memstepsize=10485760; --mfcc=false; --mfcccoeffs=13; --mfsc=true; --minisz=0; --minrate=3; --minsil=0; --mintsz=0; --momentum=0.80000000000000004; --netoptim=sgd; --noresample=false; --nthread=1; --nthread_decoder=4; --numattnhead=8; --onorm=none; --optimepsilon=1e-08; --optimrho=0.90000000000000002; --outputbinsize=5; --pctteacherforcing=100; --pcttraineval=100; --pow=false; --pretrainWindow=0; --replabel=0; --reportiters=0; --rightWindowSize=50; --rndv_filepath=; --rundir=/data/ahnaf/wav2letter/dataset_prep/; --runname=sust_full2; --samplerate=16000; --sampletarget=0; --samplingstrategy=rand; --sclite=/data/ahnaf/wav2letter/dataset_prep/sclite; --seed=0; --show=true; --showletters=false; --silweight=-0.5; --smearing=max; --smoothingtemperature=1; --softselection=inf; --softwoffset=10; --softwrate=5; --softwstd=5; --sqnorm=false; --stepsize=1000000; --surround=; --tag=; --target=tkn; --test=validation.lst; --tokens=tokens.txt; --tokensdir=/data/ahnaf/wav2letter/dataset_prep/; --train=train.lst; --trainWithWindow=false; --transdiag=0; --unkweight=-inf; --uselexicon=true; --usewordpiece=false; --valid=validation.lst; --weightdecay=0; --wordscore=1; --wordseparator=|; --world_rank=0; --world_size=1; --alsologtoemail=; --alsologtostderr=false; --colorlogtostderr=false; --drop_log_memory=true; --log_backtrace_at=; --log_dir=; --log_link=; --log_prefix=true; --logbuflevel=0; --logbufsecs=30; --logemaillevel=999; --logfile_mode=436; --logmailer=/bin/mail; --logtostderr=true; --max_log_size=1800; --minloglevel=0; --stderrthreshold=2; --stop_logging_if_full_disk=false; --symbolize_stacktrace=true; --v=0; --vmodule=; I1129 06:16:10.698246 20154 Decode.cpp:133] Number of classes (network): 179 I1129 06:16:10.771031 20154 Decode.cpp:140] Number of words: 38470 I1129 06:16:10.920435 20154 W2lListFilesDataset.cpp:141] 6594 files found. I1129 06:16:10.920583 20154 Utils.cpp:102] Filtered 0/6594 samples I1129 06:16:10.921502 20154 W2lListFilesDataset.cpp:62] Total batches (i.e. iters): 6594 I1129 06:16:10.922315 20154 Decode.cpp:154] [Serialization] Running forward pass ... I1129 06:20:31.829942 20154 Decode.cpp:201] [Dataset] Number of samples per thread: 1649 I1129 06:23:24.582834 20154 Decode.cpp:308] [Decoder] LM constructed. malloc(): memory corruption *** Aborted at 1575037404 (unix time) try "date -d @1575037404" if you are using GNU date *** PC: @ 0x7ff18601ee97 gsignal *** SIGABRT (@0x3eb00004eba) received by PID 20154 (TID 0x7ff1ad886380) from PID 20154; stack trace: *** @ 0x7ff1ac785890 (unknown) @ 0x7ff18601ee97 gsignal @ 0x7ff186020801 abort @ 0x7ff186069897 (unknown) @ 0x7ff18607090a (unknown) @ 0x7ff186074994 (unknown) @ 0x7ff1860772ed __libc_malloc @ 0x7ff186c49258 operator new() @ 0x55e61c2ecbe1 w2l::KenLM::score() @ 0x55e61c1cca1e main @ 0x7ff186001b97 __libc_start_main @ 0x55e61c21da0a _start Aborted (core dumped)

lunixbochs commented 4 years ago

could be related to this https://github.com/facebookresearch/wav2letter/issues/441

tlikhomanenko commented 4 years ago

@samin9796, @lunixbochs,

I think there is no problem with LM because in the log there is "LM constructed". Maybe related to the trie construction.

@samin9796, could you send details about your lexicon file (e.g. number of words, head of file)?

samin9796 commented 4 years ago

@tlikhomanenko There are 38470 words in the lexicon file. I am working on Bengali language.

বিদেশীর ব ি দ ে শ ী র | কৃতিত্বের ক ৃ ত ি ত ্ ব ে র | মুর্তিটাকে ম ু র ্ ত ি ট া ক ে | ফেনী ফ ে ন ী | তিরুবনন্তপুরম ত ি র ু ব ন ন ্ ত প ু র ম | ভূতেদের ভ ূ ত ে দ ে র | উঁচিয়ে উ ঁ চ ি য ় ে | ডিকো ড ি ক ো | নামসর্বস্ব ন া ম স র ্ ব স ্ ব | ঘটনাগুনো ঘ ট ন া গ ু ন ো |

tlikhomanenko commented 4 years ago


Could you also provide the head (-n50) of arpa file for your language model? Does the problem still exist for you (seems this is https://github.com/facebookresearch/wav2letter/issues/459 a duplication)?

tlikhomanenko commented 4 years ago

duplication of https://github.com/facebookresearch/wav2letter/issues/460