flashlight / wav2letter

Facebook AI Research's Automatic Speech Recognition Toolkit
https://github.com/facebookresearch/wav2letter/wiki
Other
6.37k stars 1.01k forks source link

'continue' option doesn't seem to do anything #321

Closed SY-nc closed 5 years ago

SY-nc commented 5 years ago

Hi. I trained the model using 3000 datasets of Librispeech-clean. I had set --iter=1,nthread=1,batchsize=1, as I just had to check if everything works fine. It took 12 minutes for the training to complete. Now I added 1000 more datasets in that directory ( making sure that file names are in order).

When I tried the continue option, the training completed just instantly. Here is my output.

satishyadav@satishyadav-Dell:~/speech$ sudo wav2letter/build/Train continue /home/satishyadav/speech/mytest/librispeech_clean_trainlogs --flagsfile wav2letter/tutorials/1-librispeech_clean/train.cfg --logtostderr=1
I0604 16:09:07.252578 28360 Train.cpp:78] Parsing command line flags
I0604 16:09:07.252590 28360 Train.cpp:79] Overriding flags should be mutable when using `continue`
I0604 16:09:07.252602 28360 Train.cpp:83] Reading flags from file wav2letter/tutorials/1-librispeech_clean/train.cfg
I0604 16:09:07.265381 28360 Train.cpp:136] Gflags after parsing 
--flagfile=; --fromenv=; --tryfromenv=; --undefok=; --tab_completion_columns=80; --tab_completion_word=; --help=false; --helpfull=false; --helpmatch=; --helpon=; --helppackage=false; --helpshort=false; --helpxml=false; --version=false; --adambeta1=0.90000000000000002; --adambeta2=0.999; --am=; --arch=network.arch; --archdir=/home/satishyadav/speech/wav2letter/tutorials/1-librispeech_clean/; --attention=content; --attnWindow=no; --attnconvchannel=0; --attnconvkernel=0; --attndim=0; --batchsize=1; --beamsize=2500; --beamthreshold=25; --channels=1; --criterion=ctc; --critoptim=sgd; --datadir=/home/satishyadav/speech/mytest/; --dataorder=input; --decodertype=wrd; --devwin=0; --emission_dir=; --enable_distributed=false; --encoderdim=0; --eostoken=false; --everstoredb=false; --fftcachesize=1; --filterbanks=40; --flagsfile=wav2letter/tutorials/1-librispeech_clean/train.cfg; --gamma=1; --garbage=false; --input=flac; --inputbinsize=100; --inputfeeding=false; --iter=1; --itersave=false; --labelsmooth=0; --leftWindowSize=50; --lexicon=; --linlr=-1; --linlrcrit=-1; --linseg=0; --listdata=false; --lm=; --lm_memory=5000; --lm_vocab=; --lmtype=kenlm; --lmweight=1; --localnrmlleftctx=0; --localnrmlrightctx=0; --logadd=false; --lr=0.10000000000000001; --lrcrit=0; --maxdecoderoutputlen=200; --maxgradnorm=1; --maxisz=9223372036854775807; --maxload=-1; --maxrate=10; --maxsil=50; --maxtsz=9223372036854775807; --maxword=-1; --melfloor=1; --memstepsize=10485760; --mfcc=false; --mfcccoeffs=13; --mfsc=true; --minisz=0; --minrate=3; --minsil=0; --mintsz=0; --momentum=0; --netoptim=sgd; --noresample=false; --nthread=1; --nthread_decoder=1; --onorm=target; --optimepsilon=1e-08; --optimrho=0.90000000000000002; --outputbinsize=5; --pctteacherforcing=100; --pcttraineval=100; --pow=false; --pretrainWindow=0; --replabel=2; --reportiters=0; --rightWindowSize=50; --rndv_filepath=; --rundir=/home/satishyadav/speech/mytest/; --runname=librispeech_clean_trainlogs; --samplerate=16000; --sampletarget=0; --samplingstrategy=rand; --sclite=; --seed=0; --show=false; --showletters=false; --silweight=0; --smearing=none; --softwoffset=10; --softwrate=5; --softwstd=5; --sqnorm=true; --stepsize=1000000; --surround=|; --tag=; --target=tkn; --test=; --tokens=data/tokens.txt; --tokensdir=/home/satishyadav/speech/mytest/; --train=data/train-clean-100; --trainWithWindow=false; --transdiag=0; --unkweight=-inf; --usewordpiece=false; --valid=data/dev-clean; --weightdecay=0; --wordscore=1; --wordseparator=|; --world_rank=0; --world_size=1; --alsologtoemail=; --alsologtostderr=false; --colorlogtostderr=false; --drop_log_memory=true; --log_backtrace_at=; --log_dir=; --log_link=; --log_prefix=true; --logbuflevel=0; --logbufsecs=30; --logemaillevel=999; --logfile_mode=436; --logmailer=/bin/mail; --logtostderr=true; --max_log_size=1800; --minloglevel=0; --stderrthreshold=2; --stop_logging_if_full_disk=false; --symbolize_stacktrace=true; --v=0; --vmodule=; 
I0604 16:09:07.265401 28360 Train.cpp:137] Experiment path: /home/satishyadav/speech/mytest/librispeech_clean_trainlogs
I0604 16:09:07.265404 28360 Train.cpp:138] Experiment runidx: 2
I0604 16:09:07.265743 28360 Train.cpp:166] Number of classes (network): 31
I0604 16:09:07.487443 28360 Train.cpp:208] [Network] Sequential [input -> (0) -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> (12) -> (13) -> (14) -> (15) -> (16) -> (17) -> (18) -> (19) -> (20) -> output]
    (0): View (-1 1 40 0)
    (1): Conv2D (40->256, 8x1, 2,1, SAME,SAME, 1, 1) (with bias)
    (2): ReLU
    (3): Conv2D (256->256, 8x1, 1,1, SAME,SAME, 1, 1) (with bias)
    (4): ReLU
    (5): Conv2D (256->256, 8x1, 1,1, SAME,SAME, 1, 1) (with bias)
    (6): ReLU
    (7): Conv2D (256->256, 8x1, 1,1, SAME,SAME, 1, 1) (with bias)
    (8): ReLU
    (9): Conv2D (256->256, 8x1, 1,1, SAME,SAME, 1, 1) (with bias)
    (10): ReLU
    (11): Conv2D (256->256, 8x1, 1,1, SAME,SAME, 1, 1) (with bias)
    (12): ReLU
    (13): Conv2D (256->256, 8x1, 1,1, SAME,SAME, 1, 1) (with bias)
    (14): ReLU
    (15): Conv2D (256->256, 8x1, 1,1, SAME,SAME, 1, 1) (with bias)
    (16): ReLU
    (17): Reorder (2,0,3,1)
    (18): Linear (256->512) (with bias)
    (19): ReLU
    (20): Linear (512->31) (with bias)
I0604 16:09:07.487488 28360 Train.cpp:209] [Network Params: 3901471]
I0604 16:09:07.487494 28360 Train.cpp:210] [Criterion] ConnectionistTemporalClassificationCriterion
I0604 16:09:07.487498 28360 Train.cpp:218] [Network Optimizer] SGD
I0604 16:09:07.487502 28360 Train.cpp:219] [Criterion Optimizer] SGD
I0604 16:09:07.487740 28360 NumberedFilesLoader.cpp:29] Adding dataset /home/satishyadav/speech/mytest/data/train-clean-100 ...
I0604 16:09:07.487838 28360 NumberedFilesLoader.cpp:68] 4000 files found. 
I0604 16:09:07.509717 28360 Utils.cpp:102] Filtered 0/4000 samples
I0604 16:09:07.510088 28360 W2lNumberedFilesDataset.cpp:57] Total batches (i.e. iters): 4000
I0604 16:09:07.510161 28360 NumberedFilesLoader.cpp:29] Adding dataset /home/satishyadav/speech/mytest/data/dev-clean ...
I0604 16:09:07.510243 28360 NumberedFilesLoader.cpp:68] 2703 files found. 
I0604 16:09:07.522277 28360 Utils.cpp:102] Filtered 0/2703 samples
I0604 16:09:07.522512 28360 W2lNumberedFilesDataset.cpp:57] Total batches (i.e. iters): 2703
I0604 16:09:07.522804 28360 Train.cpp:640] Finished training
satishyadav@satishyadav-Dell:~/speech$ 

Please enlighten me on what's going wrong.

xuqiantong commented 5 years ago

You are still reading the --iter=1 flag from the flagsfile. So, you can consider: 1) append a --iter=100 or more in your command line. 2) change your flagsfile.

xuqiantong commented 5 years ago

Assume issue solved. closing

SY-nc commented 5 years ago

Thanks @xuqiantong . I increased --iters to 17 and increased the dataset count to 14,000 and tried continuing. Just like before, it instantly printed 'Finished training'. Check out my output:

satishyadav@satishyadav-Dell:~/speech$ sudo wav2letter/build/Train continue /home/satishyadav/speech/mytest/librispeech_clean_trainlogs/ --flagsfile wav2letter/tutorials/1-librispeech_clean/train.cfg
[sudo] password for satishyadav: 
I0605 14:33:07.508425 12416 Train.cpp:78] Parsing command line flags
I0605 14:33:07.508476 12416 Train.cpp:79] Overriding flags should be mutable when using `continue`
I0605 14:33:07.508536 12416 Train.cpp:83] Reading flags from file wav2letter/tutorials/1-librispeech_clean/train.cfg
I0605 14:33:07.763628 12416 Train.cpp:136] Gflags after parsing 
--flagfile=; --fromenv=; --tryfromenv=; --undefok=; --tab_completion_columns=80; --tab_completion_word=; --help=false; --helpfull=false; --helpmatch=; --helpon=; --helppackage=false; --helpshort=false; --helpxml=false; --version=false; --adambeta1=0.90000000000000002; --adambeta2=0.999; --am=; --arch=network.arch; --archdir=/home/satishyadav/speech/wav2letter/tutorials/1-librispeech_clean/; --attention=content; --attnWindow=no; --attnconvchannel=0; --attnconvkernel=0; --attndim=0; --batchsize=1; --beamsize=2500; --beamthreshold=25; --channels=1; --criterion=ctc; --critoptim=sgd; --datadir=/home/satishyadav/speech/mytest/; --dataorder=input; --decodertype=wrd; --devwin=0; --emission_dir=; --enable_distributed=false; --encoderdim=0; --eostoken=false; --everstoredb=false; --fftcachesize=1; --filterbanks=40; --flagsfile=wav2letter/tutorials/1-librispeech_clean/train.cfg; --gamma=1; --garbage=false; --input=flac; --inputbinsize=100; --inputfeeding=false; --iter=17; --itersave=false; --labelsmooth=0; --leftWindowSize=50; --lexicon=; --linlr=-1; --linlrcrit=-1; --linseg=0; --listdata=false; --lm=; --lm_memory=5000; --lm_vocab=; --lmtype=kenlm; --lmweight=1; --localnrmlleftctx=0; --localnrmlrightctx=0; --logadd=false; --lr=0.10000000000000001; --lrcrit=0; --maxdecoderoutputlen=200; --maxgradnorm=1; --maxisz=9223372036854775807; --maxload=-1; --maxrate=10; --maxsil=50; --maxtsz=9223372036854775807; --maxword=-1; --melfloor=1; --memstepsize=10485760; --mfcc=false; --mfcccoeffs=13; --mfsc=true; --minisz=0; --minrate=3; --minsil=0; --mintsz=0; --momentum=0; --netoptim=sgd; --noresample=false; --nthread=4; --nthread_decoder=1; --onorm=target; --optimepsilon=1e-08; --optimrho=0.90000000000000002; --outputbinsize=5; --pctteacherforcing=100; --pcttraineval=100; --pow=false; --pretrainWindow=0; --replabel=2; --reportiters=0; --rightWindowSize=50; --rndv_filepath=; --rundir=/home/satishyadav/speech/mytest/; --runname=librispeech_clean_trainlogs; --samplerate=16000; --sampletarget=0; --samplingstrategy=rand; --sclite=; --seed=0; --show=false; --showletters=false; --silweight=0; --smearing=none; --softwoffset=10; --softwrate=5; --softwstd=5; --sqnorm=true; --stepsize=1000000; --surround=|; --tag=; --target=tkn; --test=; --tokens=data/tokens.txt; --tokensdir=/home/satishyadav/speech/mytest/; --train=data/train-clean-100; --trainWithWindow=false; --transdiag=0; --unkweight=-inf; --usewordpiece=false; --valid=data/dev-clean; --weightdecay=0; --wordscore=1; --wordseparator=|; --world_rank=0; --world_size=1; --alsologtoemail=; --alsologtostderr=false; --colorlogtostderr=false; --drop_log_memory=true; --log_backtrace_at=; --log_dir=; --log_link=; --log_prefix=true; --logbuflevel=0; --logbufsecs=30; --logemaillevel=999; --logfile_mode=436; --logmailer=/bin/mail; --logtostderr=true; --max_log_size=1800; --minloglevel=0; --stderrthreshold=2; --stop_logging_if_full_disk=false; --symbolize_stacktrace=true; --v=0; --vmodule=; 
I0605 14:33:07.763680 12416 Train.cpp:137] Experiment path: /home/satishyadav/speech/mytest/librispeech_clean_trainlogs/
I0605 14:33:07.763692 12416 Train.cpp:138] Experiment runidx: 2
I0605 14:33:07.784085 12416 Train.cpp:166] Number of classes (network): 31
I0605 14:33:09.112967 12416 Train.cpp:208] [Network] Sequential [input -> (0) -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> (12) -> (13) -> (14) -> (15) -> (16) -> (17) -> (18) -> (19) -> (20) -> output]
    (0): View (-1 1 40 0)
    (1): Conv2D (40->256, 8x1, 2,1, SAME,SAME, 1, 1) (with bias)
    (2): ReLU
    (3): Conv2D (256->256, 8x1, 1,1, SAME,SAME, 1, 1) (with bias)
    (4): ReLU
    (5): Conv2D (256->256, 8x1, 1,1, SAME,SAME, 1, 1) (with bias)
    (6): ReLU
    (7): Conv2D (256->256, 8x1, 1,1, SAME,SAME, 1, 1) (with bias)
    (8): ReLU
    (9): Conv2D (256->256, 8x1, 1,1, SAME,SAME, 1, 1) (with bias)
    (10): ReLU
    (11): Conv2D (256->256, 8x1, 1,1, SAME,SAME, 1, 1) (with bias)
    (12): ReLU
    (13): Conv2D (256->256, 8x1, 1,1, SAME,SAME, 1, 1) (with bias)
    (14): ReLU
    (15): Conv2D (256->256, 8x1, 1,1, SAME,SAME, 1, 1) (with bias)
    (16): ReLU
    (17): Reorder (2,0,3,1)
    (18): Linear (256->512) (with bias)
    (19): ReLU
    (20): Linear (512->31) (with bias)
I0605 14:33:09.113080 12416 Train.cpp:209] [Network Params: 3901471]
I0605 14:33:09.113107 12416 Train.cpp:210] [Criterion] ConnectionistTemporalClassificationCriterion
I0605 14:33:09.113126 12416 Train.cpp:218] [Network Optimizer] SGD
I0605 14:33:09.113142 12416 Train.cpp:219] [Criterion Optimizer] SGD
I0605 14:33:09.114269 12416 NumberedFilesLoader.cpp:29] Adding dataset /home/satishyadav/speech/mytest/data/train-clean-100 ...
I0605 14:33:09.114856 12416 NumberedFilesLoader.cpp:68] 14000 files found. 
I0605 14:33:56.344380 12416 Utils.cpp:102] Filtered 0/14000 samples
I0605 14:33:56.349743 12416 W2lNumberedFilesDataset.cpp:57] Total batches (i.e. iters): 14000
I0605 14:33:56.349933 12416 NumberedFilesLoader.cpp:29] Adding dataset /home/satishyadav/speech/mytest/data/dev-clean ...
I0605 14:33:56.350111 12416 NumberedFilesLoader.cpp:68] 2703 files found. 
I0605 14:34:00.603824 12416 Utils.cpp:102] Filtered 0/2703 samples
I0605 14:34:00.605015 12416 W2lNumberedFilesDataset.cpp:57] Total batches (i.e. iters): 2703
I0605 14:34:00.649971 12416 Train.cpp:640] Finished training

Please reopen the issue.

jacobkahn commented 5 years ago

@SYnchronYSe — can you print the value of FLAGS_iter when you start training to confirm it matches what you've changed it to (17)? My guess is that the flags are being clobbered somehow - it might be that the configuration of the previous model is being read in.

SY-nc commented 5 years ago

Thank you so much! It continued from the next epoch.

For some reason it was taking the --iter=1 only. Maybe I forgot to save the tokens file after updating. Now it's working fine.