flashlight / wav2letter

Facebook AI Research's Automatic Speech Recognition Toolkit
https://github.com/facebookresearch/wav2letter/wiki
Other
6.39k stars 1.01k forks source link

recipe for librispeech , why no decoder configuration file example? #295

Closed jindyliu closed 5 years ago

jindyliu commented 5 years ago

Why is there no configuration file for decoder, such as decoder.cfg, but only train.cfg exist ? https://github.com/facebookresearch/wav2letter/tree/master/recipes/librispeech/configs/seq2seq_tds

I tried to complete the training with the example training configuration, but when I used the trained model to decode, I was prompted with an error: error:

I0515 03:23:47.965700 889 Utils-base.cpp:305] Falling back to using letters as targets for the unknown word 'rodolfo' I0515 03:23:48.019260 889 Dictionary.cpp:53] Skipping unknown token: 'rodolfo' I0515 03:23:48.019330 889 Dictionary.cpp:53] Skipping unknown token: 'rodolfo' I0515 03:23:48.053910 889 Utils-base.cpp:305] Falling back to using letters as targets for the unknown word 'chiaroscurists' I0515 03:23:48.102396 889 Dictionary.cpp:53] Skipping unknown token: 'chiaroscurists' I0515 03:23:48.783941 886 Decode.cpp:165] [Dataset] Number of samples per thread: 82 F0515 03:23:48.784000 886 Decode.cpp:181] [Decoder] Invalid model type: seq2seq Check failure stack trace: @ 0x7f430fabcbcd google::LogMessage::Fail() @ 0x7f430fabf86f google::LogMessage::SendToLog() @ 0x7f430fabc763 google::LogMessage::Flush() @ 0x7f430fabe15e google::LogMessageFatal::~LogMessageFatal() @ 0x41b37c main @ 0x7f430efc33d5 __libc_start_main @ 0x46fe5c (unknown) Aborted

The code corresponding to the error: https://github.com/facebookresearch/wav2letter/blob/master/Decode.cpp#L177

ModelType modelType = ModelType::ASG; if (FLAGS_criterion == kCtcCriterion) { modelType = ModelType::CTC; } else if (FLAGS_criterion != kAsgCriterion) { LOG(FATAL) << "[Decoder] Invalid model type: " << FLAGS_criterion; }

The decoding configuration is as follows:

Updating flags from config file: /root/volume/e2e_data_speech/seq2seq_tds_distributed_500/004_model_dev-clean.bin I0515 03:19:49.054301 882 Decode.cpp:106] Gflags after parsing --adambeta1=0.90000000000000002; --adambeta2=0.999; --am=/root/volume/e2e_data_speech/seq2seq_tds_distributed_500/004_model_dev-clean.bin; --arch=network.arch; --archdir=/root/volume/w2l_seq2seq_cfg; --attention=keyvalue; --attnWindow=softPretrain; --attnconvchannel=0; --attnconvkernel=0; --attndim=0; --batchsize=16; --beamscore=25; --beamsize=1000; --channels=1; --criterion=seq2seq; --critoptim=sgd; --datadir=/root/volume/e2e_data_speech; --dataorder=output_spiral; --devwin=0; --emission_dir=; --enable_distributed=true; --encoderdim=512; --eostoken=true; --everstoredb=false; --fftcachesize=1; --filterbanks=80; --flagsfile=/root/volume/w2l_seq2seq_cfg/decode.cfg; --forceendsil=false; --gamma=0.5; --garbage=false; --input=flac; --inputbinsize=25; --inputfeeding=false; --iter=200; --itersave=false; --labelsmooth=0.050000000000000003; --leftWindowSize=50; --lexicon=/root/volume/e2e_data_speech/seq2seq/librispeech-train+dev-unigram-10000-nbest10.dict; --linlr=-1; --linlrcrit=-1; --linseg=0; --listdata=true; --lm=/root/volume/data_speech/lm/4-gram.bin; --lmtype=kenlm; --lmweight=2.5; --localnrmlleftctx=0; --localnrmlrightctx=0; --logadd=false; --lr=0.050000000000000003; --lrcrit=0.050000000000000003; --maxdecoderoutputlen=120; --maxgradnorm=15; --maxisz=9223372036854775807; --maxload=-1; --maxrate=10; --maxsil=50; --maxtsz=9223372036854775807; --maxword=-1; --melfloor=1; --memstepsize=4194304; --mfcc=false; --mfcccoeffs=13; --mfsc=true; --minisz=0; --minrate=3; --minsil=0; --mintsz=0; --momentum=0; --netoptim=sgd; --noresample=false; --nthread=1; --nthread_decoder=32; --onorm=none; --optimepsilon=1e-08; --optimrho=0.90000000000000002; --outputbinsize=5; --pctteacherforcing=99; --pcttraineval=1; --pow=false; --pretrainWindow=3; --replabel=0; --reportiters=0; --rightWindowSize=50; --rndv_filepath=; --rundir=/root/volume/e2e_data_speech; --runname=seq2seq_tds_distributed_500; --samplerate=16000; --sampletarget=0.01; --samplingstrategy=rand; --sclite=/root/volume/e2e_data_speech/decode_log/; --seed=0; --show=true; --showletters=true; --silweight=-0.5; --smearing=max; --softwoffset=10; --softwrate=5; --softwstd=4; --sqnorm=false; --stepsize=40; --surround=; --tag=; --target=ltr; --test=test-clean.lst; --tokens=librispeech-train-all-unigram-10000.vocab-filtered; --tokensdir=/root/volume/e2e_data_speech/seq2seq; --train=/root/volume/e2e_data_speech/train-clean-100.lst,/root/volume/e2e_data_speech/train-clean-360.lst,/root/volume/e2e_data_speech/train-other-500.lst; --trainWithWindow=true; --transdiag=0; --unkweight=-inf; --usewordpiece=true; --valid=dev-clean:/root/volume/e2e_data_speech/dev-clean.lst,dev-other:/root/volume/e2e_dataspeech/dev-other.lst; --weightdecay=0; --wordscore=1; --wordseparator=; --world_rank=0; --world_size=1; --flagfile=; --fromenv=; --tryfromenv=; --undefok=; --tab_completion_columns=80; --tab_completion_word=; --help=false; --helpfull=false; --helpmatch=; --helpon=; --helppackage=false; --helpshort=false; --helpxml=false; --version=false; I0515 03:19:49.059711 882 Decode.cpp:112] Number of classes (network): 9998 I0515 03:19:50.217909 882 Decode.cpp:116] Number of words: 89612

So how should I generate a correct decodr configuration, the decoder does not seem to support --criterion=seq2seq ??

lunixbochs commented 5 years ago

You should use the conv_glu or ctc recipe then for now. I've been getting pretty good results with conv_glu. You can download my 2.64 clean-TER model here: https://talonvoice.com/research/

xuqiantong commented 5 years ago

Hi @jindyliu, we are working on new APIs for several new decoders. We will support seq2seq decoding in the short future.

jindyliu commented 5 years ago

You should use the conv_glu or ctc recipe then for now. I've been getting pretty good results with conv_glu. You can download my 2.64 clean-TER model here: https://talonvoice.com/research/

Thank you! This will save a lot of time, it helps me a lot.

jindyliu commented 5 years ago

Hi @jindyliu, we are working on new APIs for several new decoders. We will support seq2seq decoding in the short future.

Thank you! looking forward to it.

zdgithub commented 5 years ago

@lunixbochs Hi, I want to train the conv_glu librispeech recipe, but the loss didn't decrease and TER was 100% all the time no matter how I adjusted the learning rate. Could you share your train.cfg file for librispeech? I would be very grateful.

lunixbochs commented 5 years ago

I’m using the same recipe as you. Just continue training from one of my posted models.

On Oct 6, 2019, at 11:12 PM, zizhan notifications@github.com wrote:

 @lunixbochs Hi, I want to train the conv_glu librispeech recipe, but the loss didn't decrease and TER was 100% all the time no matter how I adjusted the learning rate. Could you share your train.cfg file for librispeech? I would be very grateful.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.