flashlight / wav2letter

Facebook AI Research's Automatic Speech Recognition Toolkit
https://github.com/facebookresearch/wav2letter/wiki
Other
6.39k stars 1.01k forks source link

streaming_tds_model_converter [Serialization Error] #791

Closed LeonKennedy closed 4 years ago

LeonKennedy commented 4 years ago

Bug Description

i get serialization error when run streaming_tds_model_converter [Serialization Error] Mismatched output w2l:10.137 vs streaming:9.8056

log: I0821 09:39:45.692654 933 StreamingTDSModelConverter.cpp:174] Gflags after parsing --flagfile=; --fromenv=; --tryfromenv=; --undefok=; --tab_completion_columns=80; --tab_completion_word=; --help=false; --helpfull=false; --helpmatch=; --helpon=; --helppackage=false; --helpshort=false; --helpxml=false; --version=false; --adambeta1=0.90000000000000002; --adambeta2=0.999; --am=/mnt/data/speech/qkids-data/sconv/001model#mnt#data#speech#qkids-data#lists#dev.lst.bin; --am_decoder_tr_dropout=0; --am_decoder_tr_layerdrop=0; --am_decoder_tr_layers=1; --arch=am_qkids.arch; --archdir=/root/main/stream_conv; --attention=content; --attentionthreshold=2147483647; --attnWindow=no; --attnconvchannel=0; --attnconvkernel=0; --attndim=0; --batchsize=8; --beamsize=2500; --beamsizetoken=250000; --beamthreshold=25; --blobdata=false; --channels=1; --criterion=ctc; --critoptim=sgd; --datadir=; --dataorder=input; --decoderattnround=1; --decoderdropout=0; --decoderrnnlayer=1; --decodertype=wrd; --devwin=0; --emission_dir=; --emission_queue_size=3000; --enable_distributed=false; --encoderdim=0; --eosscore=0; --eostoken=false; --everstoredb=false; --fftcachesize=1; --filterbanks=40; --flagsfile=/root/main/stream_conv/train_qkids.cfg; --framesizems=25; --framestridems=10; --gamma=1; --gumbeltemperature=1; --input=wav; --inputbinsize=100; --inputfeeding=false; --isbeamdump=false; --iter=9223372036854775807; --itersave=false; --labelsmooth=0; --leftWindowSize=50; --lexicon=/mnt/data/speech/qkids-data/am/qkids-train+dev-unigram-2882-nbest5.lexicon; --linlr=-1; --linlrcrit=-1; --linseg=0; --lm=; --lm_memory=5000; --lm_vocab=; --lmtype=kenlm; --lmweight=0; --localnrmlleftctx=300; --localnrmlrightctx=0; --logadd=false; --lr=0.40000000000000002; --lr_decay=9223372036854775807; --lr_decay_step=9223372036854775807; --lrcosine=false; --lrcrit=0; --max_devices_per_node=8; --maxdecoderoutputlen=200; --maxgradnorm=0.5; --maxisz=33000; --maxload=-1; --maxrate=10; --maxsil=50; --maxtsz=9223372036854775807; --maxword=-1; --melfloor=1; --memstepsize=10485760; --mfcc=false; --mfcccoeffs=13; --mfsc=true; --minisz=200; --minrate=3; --minsil=0; --mintsz=2; --momentum=0; --netoptim=sgd; --noresample=false; --nthread=8; --nthread_decoder=1; --nthread_decoder_am_forward=1; --numattnhead=8; --onorm=target; --optimepsilon=1e-08; --optimrho=0.90000000000000002; --outputbinsize=5; --pctteacherforcing=100; --pcttraineval=1; --pow=false; --pretrainWindow=0; --replabel=0; --reportiters=0; --rightWindowSize=50; --rndv_filepath=; --rundir=/mnt/data/speech/qkids-data; --runname=sconv; --samplerate=16000; --sampletarget=0; --samplingstrategy=rand; --saug_fmaskf=27; --saug_fmaskn=2; --saug_start_update=-1; --saug_tmaskn=2; --saug_tmaskp=1; --saug_tmaskt=100; --sclite=; --seed=0; --show=false; --showletters=false; --silscore=0; --smearing=none; --smoothingtemperature=1; --softwoffset=10; --softwrate=5; --softwstd=5; --sqnorm=true; --stepsize=9223372036854775807; --surround=; --tag=; --target=tkn; --test=; --tokens=qkids-train-all-unigram-2882.tokens; --tokensdir=/mnt/data/speech/qkids-data/am/; --train=/mnt/data/speech/qkids-data/lists/train.lst; --trainWithWindow=false; --transdiag=0; --unkscore=-inf; --usememcache=false; --uselexicon=true; --usewordpiece=true; --valid=/mnt/data/speech/qkids-data/lists/dev.lst; --warmup=1; --weightdecay=0; --wordscore=0; --wordseparator=; --world_rank=0; --world_size=1; --outdir=/mnt/data/speech/qkids-data/sconv/infer; --alsologtoemail=; --alsologtostderr=false; --colorlogtostderr=false; --drop_log_memory=true; --log_backtrace_at=; --log_dir=; --log_link=; --log_prefix=true; --logbuflevel=0; --logbufsecs=30; --logemaillevel=999; --logfile_mode=436; --logmailer=/bin/mail; --logtostderr=true; --max_log_size=1800; --minloglevel=0; --stderrthreshold=2; --stop_logging_if_full_disk=false; --symbolize_stacktrace=true; --v=0; --vmodule=; I0821 09:39:45.695264 933 StreamingTDSModelConverter.cpp:192] Number of classes (network): 2880 Skipping View module: V -1 NFEAT 1 0 Skipping SpecAugment module: SAUG 40 27 2 100 1.0 2 Skipping Dropout module: DO 0.1 Skipping Dropout module: DO 0.1 Skipping Dropout module: DO 0.1 Skipping Reorder module: RO 2 1 0 3 Skipping View module: V 1000 -1 1 0 Skipping View module: V NLABEL 0 -1 1 I0821 09:39:47.084893 933 StreamingTDSModelConverter.cpp:289] Serializing acoustic model to '/mnt/data/speech/qkids-data/sconv/infer/acoustic_model.bin' I0821 09:39:47.384083 933 StreamingTDSModelConverter.cpp:301] Writing tokens file to '/mnt/data/speech/qkids-data/sconv/infer/tokens.txt' I0821 09:39:47.392366 933 StreamingTDSModelConverter.cpp:328] Serializing feature extraction model to '/mnt/data/speech/qkids-data/sconv/infer/feature_extractor.bin' I0821 09:39:47.507016 933 StreamingTDSModelConverter.cpp:344] verifying serialization ... F0821 09:39:52.450640 933 StreamingTDSModelConverter.cpp:368] [Serialization Error] Mismatched output w2l:10.137 vs streaming:9.8056 Check failure stack trace: @ 0x7fe109aed0cd google::LogMessage::Fail() @ 0x7fe109aeef33 google::LogMessage::SendToLog() @ 0x7fe109aecc28 google::LogMessage::Flush() @ 0x7fe109aef999 google::LogMessageFatal::~LogMessageFatal() @ 0x55f5a318982e main @ 0x7fe10193ab97 __libc_start_main @ 0x55f5a31f4f0a _start Aborted (core dumped)

Reproduction Steps

  1. run docker
  2. cd wav2letter/build
  3. cmake .. -DCMAKE_BUILD_TYPE=Release -DW2L_BUILD_TOOLS=ON
  4. make streaming_tds_model_converter
  5. ./streaming_tds_model_converter --am=lists#dev.lst.bin --output='...'

Platform and Hardware

Ubantu , GPU nvidait t2080ti * 1 , cmake 3.10.3

Additional Context

i use recipers/models/streaming_convents with my own data

vineelpratap commented 4 years ago

Hi, Could you also paste the arch file here.

joazoa commented 4 years ago

I have seen this a lot, the model will still be generated and work.

LeonKennedy commented 4 years ago

arch file: V -1 NFEAT 1 0 SAUG 40 27 2 100 1.0 2 PD 0 5 3 C2 1 15 10 1 2 1 0 0 R DO 0.1 LN 1 2 TDS 15 9 40 0.1 0 1 0 TDS 15 9 40 0.1 0 1 0 PD 0 7 1 C2 15 19 10 1 2 1 0 0 R DO 0.1 LN 1 2 TDS 19 11 40 0.1 0 1 0 TDS 19 11 40 0.1 0 1 0 TDS 19 11 40 0.1 0 1 0 PD 0 10 0 C2 19 25 11 1 1 1 0 0 R DO 0.1 LN 1 2 TDS 25 11 40 0.1 0 0 0 TDS 25 11 40 0.1 0 0 0 TDS 25 11 40 0.1 0 0 0 TDS 25 11 40 0.1 0 0 0 RO 2 1 0 3 V 1000 -1 1 0 L 1000 NLABEL V NLABEL 0 -1 1

LeonKennedy commented 4 years ago

I have seen this a lot, the model will still be generated and work.

yes, I got acoustic_model.bin, tokens.txt, feature_extractor.bin and there are works. bug why occurring Serialization Error?

vineelpratap commented 4 years ago

Hi, The arch file looks good. The mismatch seems minimal - 10.137 vs 9.8056. So, it should be okay to use this. Note that the inference is run at fp16 precision and hence the difference. But, the model should be pretty robust to these errors.