Closed bjl21012 closed 4 years ago
用test,别用decode
用test,别用decode 谢谢 请教下 试下下转出来 效果非常差是因为样本集太小了么? sample: 000005239, WER: 100%, LER: 601.852%, total WER: 100%, total LER: 417.802%, progress (thread 0): 99.9253%] |T|: 他 | 想 | 中 医 | 讲 | 阴 阳 | 调 和 | 阴 阳 | 不 | 协 调 | 不 平 衡 | 就 会 | 生 病 | 肩 周 炎 | 也 是 | 人 | 体 内 | 不 | 协 调 | 不 平 衡 | 的 | 表 现 |P|: 凌 潇 都 嶂 都 嶂 都 嶂 都 嶂 嶂 都 嶂 宙 嶂 宙 嶂 眨 嶂 当 都 沸 都 宙 当 都 当 沸 宙 沸 嶂 都 嶂 宙 当 宙 宙 都 嶂 拧 嶂 拧 嶂 拧 颜 当 拧 嶂 都 拧 嶂 都 当 都 都 当 宙 都 宙 当 都 嶂 宙 钥 眨 嶂 宙 嶂 颜 嶂 颜 拧 沸 嶂 都 当 都 嶂 颜 宙 当 嶂 当 都 嶂 嶂 都 宙 沸 当 祥 都 嶂 都 嶂 街 宙 嶂 都 嶂 嶂 沸 宙 都 当 都 当 嶂 当 都 嶂 当 祥 嶂 都 嶂 拧 嶂 沸 当 都 嶂 宙 嶂 都 都 嶂 都 嶂 沸 嶂 嶂 宙 嶂 都 宙 嶂 宙 眨 嶂 拧 嶂 宙 嶂 颜 嶂 都 拧 都 嶂 钥 眨 嶂 宙 嶂 拧 嶂 宙 嶂 都 当 宙 嶂 宙 都 拧 都 颜 嶂 沸 都 嶂 都 嶂 嶂 宙 当 嶂 当 嶂 当 当 都 嶂 沸 都 嶂 宙 嶂 街 拧 眨 嶂 宙 嶂 当 嶂 当 都 宙 都 嶂 都 嶂 当 都 嶂 都 当 都 都 宙 嶂 沸 嶂 当 嶂 当 沸 嶂 祥 披
Hi @bjl21012,
are you sure that your model is trained? From the log you provided WER and TER of the model is 100% which means it generates totally wrong transcription. The less WER and TER the better model.
Hi @bjl21012,
are you sure that your model is trained? From the log you provided WER and TER of the model is 100% which means it generates totally wrong transcription. The less WER and TER the better model.
here is my train command , is it Ok ? ../../build/Train train --flagsfile chinesetrain.cfg --logtostderr=1 --reportiters=1000 I0417 05:53:06.199527 6381 Train.cpp:59] Reading flags from file chinesetrain.cfg I0417 05:53:06.212417 6381 Train.cpp:148] Gflags after parsing --flagfile=; --fromenv=; --tryfromenv=; --undefok=; --tab_completion_columns=80; --tab_completion_word=; --help=false; --helpfull=false; --helpmatch=; --helpon=; --helppackage=false; --helpshort=false; --helpxml=false; --version=false; --adambeta1=0.90000000000000002; --adambeta2=0.999; --am=; --am_decoder_tr_dropout=0; --am_decoder_tr_layerdrop=0; --am_decoder_tr_layers=1; --arch=network.arch; --archdir=/root/wav2letter/tutorials/1-librispeech_clean/; --attention=content; --attentionthreshold=2147483647; --attnWindow=no; --attnconvchannel=0; --attnconvkernel=0; --attndim=0; --batchsize=4; --beamsize=2500; --beamsizetoken=250000; --beamthreshold=25; --blobdata=false; --channels=1; --criterion=ctc; --critoptim=sgd; --datadir=/home/bjl/data; --dataorder=input; --decoderattnround=1; --decoderdropout=0; --decoderrnnlayer=1; --decodertype=wrd; --devwin=0; --emission_dir=; --emission_queue_size=3000; --enable_distributed=false; --encoderdim=0; --eosscore=0; --eostoken=false; --everstoredb=false; --fftcachesize=1; --filterbanks=40; --flagsfile=chinesetrain.cfg; --framesizems=25; --framestridems=10; --gamma=1; --gumbeltemperature=1; --input=wav; --inputbinsize=100; --inputfeeding=false; --isbeamdump=false; --iter=100; --itersave=false; --labelsmooth=0; --leftWindowSize=50; --lexicon=/home/bjl/data/am/lexicon3.txt; --linlr=-1; --linlrcrit=-1; --linseg=0; --lm=; --lm_memory=5000; --lm_vocab=; --lmtype=kenlm; --lmweight=0; --localnrmlleftctx=0; --localnrmlrightctx=0; --logadd=false; --lr=0.10000000000000001; --lr_decay=9223372036854775807; --lr_decay_step=9223372036854775807; --lrcosine=false; --lrcrit=0; --maxdecoderoutputlen=200; --maxgradnorm=1; --maxisz=9223372036854775807; --maxload=-1; --maxrate=10; --maxsil=50; --maxtsz=9223372036854775807; --maxword=-1; --melfloor=1; --memstepsize=10485760; --mfcc=false; --mfcccoeffs=13; --mfsc=true; --minisz=0; --minrate=3; --minsil=0; --mintsz=0; --momentum=0; --netoptim=sgd; --noresample=false; --nthread=4; --nthread_decoder=1; --nthread_decoder_am_forward=1; --numattnhead=8; --onorm=target; --optimepsilon=1e-08; --optimrho=0.90000000000000002; --outputbinsize=5; --pctteacherforcing=100; --pcttraineval=100; --pow=false; --pretrainWindow=0; --replabel=1; --reportiters=1000; --rightWindowSize=50; --rndv_filepath=; --rundir=/home/bjl/data; --runname=thchtrainlogs; --samplerate=16000; --sampletarget=0; --samplingstrategy=rand; --saug_fmaskf=27; --saug_fmaskn=2; --saug_start_update=-1; --saug_tmaskn=2; --saug_tmaskp=1; --saug_tmaskt=100; --sclite=; --seed=0; --show=false; --showletters=false; --silscore=0; --smearing=none; --smoothingtemperature=1; --softwoffset=10; --softwrate=5; --softwstd=5; --sqnorm=true; --stepsize=9223372036854775807; --surround=|; --tag=; --target=tkn; --test=; --tokens=/home/bjl/data/am/chinesetokens.txt; --tokensdir=; --train=lists/trainlist.lst; --trainWithWindow=false; --transdiag=0; --unkscore=-inf; --use_memcache=false; --use_saug=false; --uselexicon=true; --usewordpiece=false; --valid=lists/devlist.lst; --warmup=8000; --weightdecay=0; --wordscore=0; --wordseparator=|; --world_rank=0; --world_size=1; --alsologtoemail=; --alsologtostderr=false; --colorlogtostderr=false; --drop_log_memory=true; --log_backtrace_at=; --log_dir=; --log_link=; --log_prefix=true; --logbuflevel=0; --logbufsecs=30; --logemaillevel=999; --logfile_mode=436; --logmailer=/bin/mail; --logtostderr=true; --max_log_size=1800; --minloglevel=0; --stderrthreshold=2; --stop_logging_if_full_disk=false; --symbolize_stacktrace=true; --v=0; --vmodule=; I0417 05:53:06.212961 6381 Train.cpp:149] Experiment path: /home/bjl/data/thchtrainlogs I0417 05:53:06.212975 6381 Train.cpp:150] Experiment runidx: 1 I0417 05:53:06.221945 6381 Train.cpp:194] Number of classes (network): 2886 I0417 05:53:06.247638 6381 Train.cpp:201] Number of words: 8874 I0417 05:53:06.253541 6381 Train.cpp:215] Loading architecture file from /root/wav2letter/tutorials/1-librispeech_clean/network.arch I0417 05:53:07.011435 6381 Train.cpp:247] [Network] Sequential [input -> (0) -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> (12) -> (13) -> (14) -> (15) -> (16) -> (17) -> (18) -> (19) -> (20) -> output] (0): View (-1 1 40 0) (1): Conv2D (40->256, 8x1, 2,1, SAME,SAME, 1, 1) (with bias) (2): ReLU (3): Conv2D (256->256, 8x1, 1,1, SAME,SAME, 1, 1) (with bias) (4): ReLU (5): Conv2D (256->256, 8x1, 1,1, SAME,SAME, 1, 1) (with bias) (6): ReLU (7): Conv2D (256->256, 8x1, 1,1, SAME,SAME, 1, 1) (with bias) (8): ReLU (9): Conv2D (256->256, 8x1, 1,1, SAME,SAME, 1, 1) (with bias) (10): ReLU (11): Conv2D (256->256, 8x1, 1,1, SAME,SAME, 1, 1) (with bias) (12): ReLU (13): Conv2D (256->256, 8x1, 1,1, SAME,SAME, 1, 1) (with bias) (14): ReLU (15): Conv2D (256->256, 8x1, 1,1, SAME,SAME, 1, 1) (with bias) (16): ReLU (17): Reorder (2,0,3,1) (18): Linear (256->512) (with bias) (19): ReLU (20): Linear (512->2886) (with bias) I0417 05:53:07.011883 6381 Train.cpp:248] [Network Params: 5366086] I0417 05:53:07.011919 6381 Train.cpp:249] [Criterion] ConnectionistTemporalClassificationCriterion I0417 05:53:07.011962 6381 Train.cpp:257] [Network Optimizer] SGD I0417 05:53:07.011981 6381 Train.cpp:258] [Criterion Optimizer] SGD I0417 05:53:07.165275 6381 W2lListFilesDataset.cpp:141] 8033 files found. I0417 05:53:07.165488 6381 Utils.cpp:102] Filtered 0/8033 samples I0417 05:53:07.165951 6381 W2lListFilesDataset.cpp:62] Total batches (i.e. iters): 2009 I0417 05:53:07.202600 6381 W2lListFilesDataset.cpp:141] 2677 files found. I0417 05:53:07.202692 6381 Utils.cpp:102] Filtered 0/2677 samples I0417 05:53:07.202842 6381 W2lListFilesDataset.cpp:62] Total batches (i.e. iters): 670 I0417 05:53:07.203177 6381 Train.cpp:555] Shuffling trainset I0417 05:53:07.203544 6381 Train.cpp:562] Epoch 1 started! I0417 05:53:22.064838 6381 Train.cpp:737] Finished training
You should set --iter=200000
- which means train for 200k updates (which is around 100 epoch in you case). Right now you are training only 100 updates (while to make 1 epoch you need to do 2009 updates I0417 05:53:07.165951 6381 W2lListFilesDataset.cpp:62] Total batches (i.e. iters): 2009
)
你wer还100%,wer是错词率,就是识别结果100%是错误的,肯定不行啊。
你wer还100%,wer是错词率,就是识别结果100%是错误的,肯定不行啊。
我应该怎么调整测试呢,是训练配置有问题,还是样本集的问题?
你的--iter=25有问题,太太太小了,基本等于没训练所以才导致WER=100% I0417 05:53:07.165488 6381 Utils.cpp:102] Filtered 0/8033 samples 说的是你的训练集内容有8033个 I0417 05:53:07.165951 6381 W2lListFilesDataset.cpp:62] Total batches (i.e. iters): 2009 说的是这8033个分成了2009个batches,因为你的--batchsize=4,8033/4=2009(进一取整) 所以每个epoch需要2009个updates,如果你需要训练10 epoch,那就是2009*10,设置--iter=20090,当然你为了方便设置成20000也是没问题的。
You should set
--iter=200000
- which means train for 200k updates (which is around 100 epoch in you case). Right now you are training only 100 updates (while to make 1 epoch you need to do 2009 updatesI0417 05:53:07.165951 6381 W2lListFilesDataset.cpp:62] Total batches (i.e. iters): 2009
)
I reset train.cfg ,now I can train 100 epochs , but wer still 100% , Is there any other problem with the config file ?,thanks.
train.cfg --datadir=/home/bjl/data --rundir=/home/bjl/data --archdir=/root/wav2letter/tutorials/1-librispeech_clean/ --train=lists/trainlist.lst --valid=lists/devlist.lst --input=wav --arch=network.arch --tokens=/home/bjl/data/am/chinesetokens.txt --lexicon=/home/bjl/data/am/lexicon3.txt --criterion=ctc --lr=0.0001 --lrcrit=0.0001 --maxgradnorm=1.0 --replabel=1 --surround=| --onorm=target --sqnorm=true --mfsc=true --filterbanks=40 --nthread=4 --batchsize=4 --runname=thchtrainlogs --iter=200900
train log
epoch: 1 | nupdates: 2009 | lr: 0.000025 | lrcriterion: 0.000025 | runtime: 00:01:38 | bch(ms): 49.27 | smp(ms): 22.05 | fwd(ms): 14.58 | crit-fwd(ms): 10.86 | bwd(ms): 6.12 | optim(ms): 2.05 | loss: 480.48863 | train-TER: 296.07 | train-WER: 100.00 | lists/devlist.lst-loss: 479.15134 | lists/devlist.lst-TER: 307.74 | lists/devlist.lst-WER: 100.00 | avg-isz: 932 | avg-tsz: 059 | max-tsz: 077 | hrs: 20.81 | thrpt(sec/sec): 756.71 epoch: 2 | nupdates: 4018 | lr: 0.000050 | lrcriterion: 0.000050 | runtime: 00:01:31 | bch(ms): 45.71 | smp(ms): 21.91 | fwd(ms): 12.74 | crit-fwd(ms): 10.52 | bwd(ms): 5.65 | optim(ms): 1.58 | loss: 472.03904 | train-TER: 288.09 | train-WER: 100.00 | lists/devlist.lst-loss: 463.91075 | lists/devlist.lst-TER: 99.85 | lists/devlist.lst-WER: 100.00 | avg-isz: 932 | avg-tsz: 059 | max-tsz: 077 | hrs: 20.81 | thrpt(sec/sec): 815.77 epoch: 3 | nupdates: 6027 | lr: 0.000075 | lrcriterion: 0.000075 | runtime: 00:01:32 | bch(ms): 46.03 | smp(ms): 22.44 | fwd(ms): 12.75 | crit-fwd(ms): 10.53 | bwd(ms): 5.66 | optim(ms): 1.60 | loss: 437.50336 | train-TER: 99.95 | train-WER: 99.99 | lists/devlist.lst-loss: 395.35017 | lists/devlist.lst-TER: 100.00 | lists/devlist.lst-WER: 100.00 | avg-isz: 932 | avg-tsz: 059 | max-tsz: 077 | hrs: 20.81 | thrpt(sec/sec): 810.02 epoch: 95 | nupdates: 190855 | lr: 0.000100 | lrcriterion: 0.000100 | runtime: 00:01:32 | bch(ms): 46.08 | smp(ms): 22.62 | fwd(ms): 12.69 | crit-fwd(ms): 10.49 | bwd(ms): 5.64 | optim(ms): 1.59 | loss: 39.11445 | train-TER: 100.00 | train-WER: 100.00 | lists/devlist.lst-loss: 39.09715 | lists/devlist.lst-TER: 100.00 | lists/devlist.lst-WER: 100.00 | avg-isz: 932 | avg-tsz: 059 | max-tsz: 077 | hrs: 20.81 | thrpt(sec/sec): 809.09 epoch: 96 | nupdates: 192864 | lr: 0.000100 | lrcriterion: 0.000100 | runtime: 00:01:30 | bch(ms): 45.05 | smp(ms): 21.63 | fwd(ms): 12.69 | crit-fwd(ms): 10.48 | bwd(ms): 5.63 | optim(ms): 1.58 | loss: 39.08187 | train-TER: 100.00 | train-WER: 100.00 | lists/devlist.lst-loss: 39.07248 | lists/devlist.lst-TER: 100.00 | lists/devlist.lst-WER: 100.00 | avg-isz: 932 | avg-tsz: 059 | max-tsz: 077 | hrs: 20.81 | thrpt(sec/sec): 827.68 epoch: 97 | nupdates: 194873 | lr: 0.000100 | lrcriterion: 0.000100 | runtime: 00:01:31 | bch(ms): 45.66 | smp(ms): 22.15 | fwd(ms): 12.70 | crit-fwd(ms): 10.49 | bwd(ms): 5.63 | optim(ms): 1.61 | loss: 39.04955 | train-TER: 100.00 | train-WER: 100.00 | lists/devlist.lst-loss: 39.03678 | lists/devlist.lst-TER: 100.00 | lists/devlist.lst-WER: 100.00 | avg-isz: 932 | avg-tsz: 059 | max-tsz: 077 | hrs: 20.81 | thrpt(sec/sec): 816.67 epoch: 98 | nupdates: 196882 | lr: 0.000100 | lrcriterion: 0.000100 | runtime: 00:01:31 | bch(ms): 45.36 | smp(ms): 21.93 | fwd(ms): 12.70 | crit-fwd(ms): 10.49 | bwd(ms): 5.64 | optim(ms): 1.58 | loss: 39.01797 | train-TER: 100.00 | train-WER: 100.00 | lists/devlist.lst-loss: 39.00588 | lists/devlist.lst-TER: 100.00 | lists/devlist.lst-WER: 100.00 | avg-isz: 932 | avg-tsz: 059 | max-tsz: 077 | hrs: 20.81 | thrpt(sec/sec): 822.04 epoch: 99 | nupdates: 198891 | lr: 0.000100 | lrcriterion: 0.000100 | runtime: 00:01:31 | bch(ms): 45.37 | smp(ms): 21.86 | fwd(ms): 12.71 | crit-fwd(ms): 10.50 | bwd(ms): 5.64 | optim(ms): 1.60 | loss: 38.98587 | train-TER: 100.00 | train-WER: 100.00 | lists/devlist.lst-loss: 38.96927 | lists/devlist.lst-TER: 99.99 | lists/devlist.lst-WER: 100.00 | avg-isz: 932 | avg-tsz: 059 | max-tsz: 077 | hrs: 20.81 | thrpt(sec/sec): 821.82 epoch: 100 | nupdates: 200900 | lr: 0.000100 | lrcriterion: 0.000100 | runtime: 00:01:32 | bch(ms): 46.18 | smp(ms): 22.71 | fwd(ms): 12.71 | crit-fwd(ms): 10.49 | bwd(ms): 5.63 | optim(ms): 1.59 | loss: 38.95328 | train-TER: 99.99 | train-WER: 100.00 | lists/devlist.lst-loss: 38.93468 | lists/devlist.lst-TER: 99.99 | lists/devlist.lst-WER: 100.00 | avg-isz: 932 | avg-tsz: 059 | max-tsz: 077 | hrs: 20.81 | thrpt(sec/sec): 807.43:
Training 基本没什么问题,你需要调整一下你的训练配置。比如现在看来你的learning rate就可能有点小。同时你也可以尝试一些别的模型和optimizer。
Training 基本没什么问题,你需要调整一下你的训练配置。比如现在看来你的learning rate就可能有点小。同时你也可以尝试一些别的模型和optimizer。
我试着把lr 设大一点,超过0.13就会抛异常了,Loss has NaN values. Samples - 000003562,000007945,000002127,000006786 如果用其他模型,有哪些可以推荐的么?
Hi I try to use data_thchs30 dataset(about 33hours) to train chinese models ,train success ,but I cannot decoder out any word ,I don't konw what‘s the problem ,
the train.cfg: --datadir=/home/bjl/data --rundir=/home/bjl/data --archdir=/root/wav2letter/tutorials/1-librispeech_clean/ --train=lists/train-clean-100.lst --valid=lists/dev-clean.lst --input=flac --arch=network.arch --tokens=/home/bjl/data/am/tokens.txt --lexicon=/home/bjl/data/am/lexicon.txt --criterion=ctc --lr=0.1 --maxgradnorm=1.0 --replabel=1 --surround=| --onorm=target --sqnorm=true --mfsc=true --filterbanks=40 --nthread=4 --batchsize=4 --runname=librispeech_clean_trainlogs --iter=25
001_log
epoch: 1 | nupdates: 101 | lr: 0.001263 | lrcriterion: 0.000000 | runtime: 00:00:11 | bch(ms): 114.17 | smp(ms): 26.11 | fwd(ms): 47.44 | crit-fwd(ms): 17.50 | bwd(ms): 15.23 | optim(ms): 10.81 | loss: 473.99903 | train-TER: 318.04 | train-WER: 100.00 | lists/devlist.lst-loss: 472.35730 | lists/devlist.lst-TER: 415.86 | lists/devlist.lst-WER: 100.00 | avg-isz: 926 | avg-tsz: 059 | max-tsz: 077 | hrs: 1.04 | thrpt(sec/sec): 324.44
decoder.cfg
--lexicon=/home/bjl/data/am/lexicon3.txt --lm=/home/bjl/data/am/cn_text.arpa --am=/home/bjl/data/thchtrainlogs/001_model_lists#devlist.lst.bin --test=lists/devlist.lst --datadir=/home/bjl/data/ --sclite=/home/bjl/data --lmweight=2.5 --input=wav --wordscore=1 --beamsize=500 --beamthreshold=25 --silweight=-0.5 --nthread_decoder=4 --smearing=max --show=true