flashlight / wav2letter

Facebook AI Research's Automatic Speech Recognition Toolkit
https://github.com/facebookresearch/wav2letter/wiki
Other
6.39k stars 1.01k forks source link

Error init training - Segmentation fault (core dumped) on wrd2Target()` #554

Closed abramovi closed 4 years ago

abramovi commented 4 years ago

Hi.

I am trying to train using librispeech recipe on my own small data set.

I am using the following config:

--runname=ted_test --rundir=/data/ --datadir=/data/data/input/ --archdir=/data/ --train=lists/train.lst --valid=lists/valid.lst --input=wav --arch=network.arch --tokens=/data/data/output/am/tokens.txt --lexicon=/data/data/output/am/lexicon.txt --criterion=ctc --lr=0.1 --maxgradnorm=1.0 --replabel=1 --surround=| --onorm=target --sqnorm=true --mfsc=true --filterbanks=40 --nthread=4 --batchsize=4 --iter=25 --listdata=false

and I amm getting the following:

Falling back to using letters as targets for the unknown word: swan Falling back to using letters as targets for the unknown word: dancer's Falling back to using letters as targets for the unknown word: arabesques Falling back to using letters as targets for the unknown word: plies Falling back to using letters as targets for the unknown word: microseconds Falling back to using letters as targets for the unknown word: starter Falling back to using letters as targets for the unknown word: styled Falling back to using letters as targets for the unknown word: bores Falling back to using letters as targets for the unknown word: underpaid Aborted at 1582491930 (unix time) try "date -d @1582491930" if you are using GNU date PC: @ 0x563922 w2l::wrd2Target() SIGSEGV (@0x8) received by PID 393 (TID 0x7fb7d720e600) from PID 8; stack trace: @ 0x7fb7918d7390 (unknown) @ 0x563922 w2l::wrd2Target() @ 0x565b5b w2l::wrd2Target() @ 0x5d3669 w2l::W2lListFilesDataset::loadListFile() @ 0x5d4059 w2l::W2lListFilesDataset::W2lListFilesDataset() @ 0x5c1ee0 w2l::createDataset() @ 0x41b0a1 main @ 0x7fb789a15830 __libc_start_main @ 0x479279 _start @ 0x0 (unknown) Segmentation fault (core dumped)

tlikhomanenko commented 4 years ago

Hi @abramovi,

Lexicon file is using to map words to the tokens sequence for the target transcription. And we learn probabilities for each each token for each frame in case of CTC. If we meet a word (from train and valid lists) which is not listed in the lexicon we don't know how to map it to the tokens set. In the w2l we use letters sequence to map the word. But in this case all these letters should be in the tokens set. So my guess that you have some word which is absent in the lexicon and whose letters are not all in the tokens file. Please check this.

Often construct lexicon from all words from the train and valid transcriptions.

abramovi commented 4 years ago

Thank you @tlikhomanenko for your answer.

I rebuild my lexicon - have words in it.

here is my current full log:

root@8170d4db421f:~/wav2letter/build# ./Train train --flagsfile /data/train.cfg I0224 21:35:40.914047 84 Train.cpp:141] Gflags after parsing --flagfile=; --fromenv=; --tryfromenv=; --undefok=; --tab_completion_columns=80; --tab_completion_word=; --help=false; --helpfull=false; --helpmatch=; --helpon=; --helppackage=fals e; --helpshort=false; --helpxml=false; --version=false; --adambeta1=0.90000000000000002; --adambeta2=0.999; --am=; --am_decoder_tr_dropout=0; --am_decoder_tr_layerdrop=0; --am_deco der_tr_layers=1; --arch=network.arch; --archdir=/data; --attention=content; --attentionthreshold=2147483647; --attnWindow=no; --attnconvchannel=0; --attnconvkernel=0; --attndim=0; --batchsize=4; --beamsize=2500; --beamsizetoken=250000; --beamthreshold=25; --blobdata=false; --channels=1; --criterion=asg; --critoptim=sgd; --datadir=/data/data/output; --dataord er=input; --decoderattnround=1; --decoderdropout=0; --decoderrnnlayer=1; --decodertype=wrd; --devwin=0; --emission_dir=; --emission_queue_size=3000; --enable_distributed=false; --e ncoderdim=0; --eosscore=0; --eostoken=false; --everstoredb=false; --fftcachesize=1; --filterbanks=40; --flagsfile=/data/train.cfg; --framesizems=25; --framestridems=10; --gamma=1; --gumbeltemperature=1; --input=wav; --inputbinsize=100; --inputfeeding=false; --isbeamdump=false; --iter=25; --itersave=false; --labelsmooth=0; --leftWindowSize=50; --lexicon=/data /data/output/am/lexicon.txt; --linlr=-1; --linlrcrit=-1; --linseg=0; --lm=; --lm_memory=5000; --lm_vocab=; --lmtype=kenlm; --lmweight=0; --localnrmlleftctx=0; --localnrmlrightctx=0 ; --logadd=false; --lr=0.10000000000000001; --lrcosine=false; --lrcrit=0; --maxdecoderoutputlen=200; --maxgradnorm=1; --maxisz=9223372036854775807; --maxload=-1; --maxrate=10; --ma xsil=50; --maxtsz=9223372036854775807; --maxword=-1; --melfloor=1; --memstepsize=10485760; --mfcc=false; --mfcccoeffs=13; --mfsc=true; --minisz=0; --minrate=3; --minsil=0; --mintsz =0; --momentum=0; --netoptim=sgd; --noresample=false; --nthread=4; --nthread_decoder=1; --nthread_decoder_am_forward=1; --numattnhead=8; --onorm=target; --optimepsilon=1e-08; --opt imrho=0.90000000000000002; --outputbinsize=5; --pctteacherforcing=100; --pcttraineval=100; --pow=false; --pretrainWindow=0; --replabel=1; --reportiters=0; --rightWindowSize=50; --r ndv_filepath=; --rundir=/data/data/; --runname=librispeech_clean_trainlogs; --samplerate=16000; --sampletarget=0; --samplingstrategy=rand; --sclite=; --seed=0; --show=false; --show letters=false; --silscore=0; --smearing=none; --smoothingtemperature=1; --softwoffset=10; --softwrate=5; --softwstd=5; --sqnorm=true; --stepsize=1000000; --surround=|; --tag=; --ta rget=tkn; --test=; --tokens=tokens.txt; --tokensdir=/data/data/output/am; --train=lists/train.lst.fix; --trainWithWindow=false; --transdiag=0; --unkscore=-inf; --use_memcache=false ; --uselexicon=true; --usewordpiece=false; --valid=lists/valid.lst.fix; --weightdecay=0; --wordscore=0; --wordseparator=|; --world_rank=0; --world_size=1; --alsologtoemail=; --also logtostderr=false; --colorlogtostderr=false; --drop_log_memory=true; --log_backtrace_at=; --log_dir=; --log_link=; --log_prefix=true; --logbuflevel=0; --logbufsecs=30; --logemaille vel=999; --logmailer=/bin/mail; --logtostderr=true; --max_log_size=1800; --minloglevel=0; --stderrthreshold=2; --stop_logging_if_full_disk=false; --symbolize_stacktrace=true; --v=0 ; --vmodule=; I0224 21:35:40.914216 84 Train.cpp:142] Experiment path: /data/data/librispeech_clean_trainlogs I0224 21:35:40.914271 84 Train.cpp:143] Experiment runidx: 1 I0224 21:35:40.922464 84 Train.cpp:187] Number of classes (network): 29 I0224 21:35:41.045508 84 Train.cpp:194] Number of words: 54054 I0224 21:35:41.068939 84 Train.cpp:208] Loading architecture file from /data/network.arch I0224 21:35:41.077917 84 Train.cpp:240] [Network] Sequential [input -> (0) -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> (12) -> (13) -> (14) -> (15) -> (16) -> (17) -> (18) -> (19) -> (20) -> output] (0): View (-1 1 40 0) (1): Conv2D (40->256, 8x1, 2,1, SAME,SAME, 1, 1) (with bias) (2): ReLU (3): Conv2D (256->256, 8x1, 1,1, SAME,SAME, 1, 1) (with bias) (4): ReLU (5): Conv2D (256->256, 8x1, 1,1, SAME,SAME, 1, 1) (with bias) (6): ReLU (7): Conv2D (256->256, 8x1, 1,1, SAME,SAME, 1, 1) (with bias) (8): ReLU (9): Conv2D (256->256, 8x1, 1,1, SAME,SAME, 1, 1) (with bias) (10): ReLU (11): Conv2D (256->256, 8x1, 1,1, SAME,SAME, 1, 1) (with bias) (12): ReLU (13): Conv2D (256->256, 8x1, 1,1, SAME,SAME, 1, 1) (with bias) (14): ReLU (15): Conv2D (256->256, 8x1, 1,1, SAME,SAME, 1, 1) (with bias) (16): ReLU (17): Reorder (2,0,3,1) (18): Linear (256->512) (with bias) (19): ReLU (20): Linear (512->29) (with bias) I0224 21:35:41.077971 84 Train.cpp:241] [Network Params: 3900445] I0224 21:35:41.077986 84 Train.cpp:242] [Criterion] AutoSegmentationCriterion I0224 21:35:41.078006 84 Train.cpp:250] [Network Optimizer] SGD I0224 21:35:41.078037 84 Train.cpp:251] [Criterion Optimizer] SGD Aborted at 1582580142 (unix time) try "date -d @1582580142" if you are using GNU date PC: @ 0x5ca492 w2l::wrd2Target() SIGSEGV (@0x8) received by PID 84 (TID 0x7f56544bdbc0) from PID 8; stack trace: @ 0x7f5649f6c390 (unknown) @ 0x5ca492 w2l::wrd2Target() @ 0x5cbfeb w2l::wrd2Target() @ 0x61ddc3 w2l::W2lListFilesDataset::loadListFile() @ 0x61e859 w2l::W2lListFilesDataset::W2lListFilesDataset() @ 0x62fb1a w2l::createDataset() @ 0x41ac91 main @ 0x7f56490e1830 __libc_start_main @ 0x48ce49 _start @ 0x0 (unknown) Segmentation fault

I am not sure that it is related to my lexicon as I use prepare-data.py to build it using my all data.

any idea how to find out which word it missing ?

tlikhomanenko commented 4 years ago

Could you attach your tokens set file and your generated lexicon? I want to have a look at them to make sure they are fine.

abramovi commented 4 years ago

thank you !!!! issue was as you said related to digits in my lexicon and tokens files

tranmanhdat commented 3 years ago

Hi @abramovi,

Lexicon file is using to map words to the tokens sequence for the target transcription. And we learn probabilities for each each token for each frame in case of CTC. If we meet a word (from train and valid lists) which is not listed in the lexicon we don't know how to map it to the tokens set. In the w2l we use letters sequence to map the word. But in this case all these letters should be in the tokens set. So my guess that you have some word which is absent in the lexicon and whose letters are not all in the tokens file. Please check this.

Often construct lexicon from all words from the train and valid transcriptions.

what should i do when i have some words not in lexicons/tokens but it appears in train/val text and i dont want put it to lexicons/tokens files?

tlikhomanenko commented 3 years ago

Do you want to skip these words during training at all?

tranmanhdat commented 3 years ago

Do you want to skip these words during training at all?

Yes, should i replace these words by In train/val transcript or another way?

tranmanhdat commented 3 years ago

Do you want to skip these words during training at all?

i skip all samples have unknown words, but i have a another problem issues

tlikhomanenko commented 3 years ago

We have fallback to letters for unknown words, in case of letters tokens only any word will be present in training/val transcription anyway (skipping only unknown letters). So you need to preprocess your list to have skipped necessary words before running Train binary.