chinese train help - Githubissues

bjl21012 commented 4 years ago

Hi I try to use data_thchs30 dataset（about 33hours） to train chinese models ,train success ,but I cannot decoder out any word ,I don't konw what‘s the problem ,

the train.cfg: --datadir=/home/bjl/data --rundir=/home/bjl/data --archdir=/root/wav2letter/tutorials/1-librispeech_clean/ --train=lists/train-clean-100.lst --valid=lists/dev-clean.lst --input=flac --arch=network.arch --tokens=/home/bjl/data/am/tokens.txt --lexicon=/home/bjl/data/am/lexicon.txt --criterion=ctc --lr=0.1 --maxgradnorm=1.0 --replabel=1 --surround=| --onorm=target --sqnorm=true --mfsc=true --filterbanks=40 --nthread=4 --batchsize=4 --runname=librispeech_clean_trainlogs --iter=25

001_log

decoder.cfg

--lexicon=/home/bjl/data/am/lexicon3.txt --lm=/home/bjl/data/am/cn_text.arpa --am=/home/bjl/data/thchtrainlogs/001_model_lists#devlist.lst.bin --test=lists/devlist.lst --datadir=/home/bjl/data/ --sclite=/home/bjl/data --lmweight=2.5 --input=wav --wordscore=1 --beamsize=500 --beamthreshold=25 --silweight=-0.5 --nthread_decoder=4 --smearing=max --show=true

jkkj1630 commented 4 years ago

用test，别用decode

bjl21012 commented 4 years ago

用test，别用decode 谢谢请教下试下下转出来效果非常差是因为样本集太小了么？ sample: 000005239, WER: 100%, LER: 601.852%, total WER: 100%, total LER: 417.802%, progress (thread 0): 99.9253%] |T|: 他 | 想 | 中医 | 讲 | 阴阳 | 调和 | 阴阳 | 不 | 协调 | 不平衡 | 就会 | 生病 | 肩周炎 | 也是 | 人 | 体内 | 不 | 协调 | 不平衡 | 的 | 表现 |P|: 凌潇都嶂都嶂都嶂都嶂嶂都嶂宙嶂宙嶂眨嶂当都沸都宙当都当沸宙沸嶂都嶂宙当宙宙都嶂拧嶂拧嶂拧颜当拧嶂都拧嶂都当都都当宙都宙当都嶂宙钥眨嶂宙嶂颜嶂颜拧沸嶂都当都嶂颜宙当嶂当都嶂嶂都宙沸当祥都嶂都嶂街宙嶂都嶂嶂沸宙都当都当嶂当都嶂当祥嶂都嶂拧嶂沸当都嶂宙嶂都都嶂都嶂沸嶂嶂宙嶂都宙嶂宙眨嶂拧嶂宙嶂颜嶂都拧都嶂钥眨嶂宙嶂拧嶂宙嶂都当宙嶂宙都拧都颜嶂沸都嶂都嶂嶂宙当嶂当嶂当当都嶂沸都嶂宙嶂街拧眨嶂宙嶂当嶂当都宙都嶂都嶂当都嶂都当都都宙嶂沸嶂当嶂当沸嶂祥披

tlikhomanenko commented 4 years ago

Hi @bjl21012,

are you sure that your model is trained? From the log you provided WER and TER of the model is 100% which means it generates totally wrong transcription. The less WER and TER the better model.

bjl21012 commented 4 years ago

Hi @bjl21012,

are you sure that your model is trained? From the log you provided WER and TER of the model is 100% which means it generates totally wrong transcription. The less WER and TER the better model.

here is my train command , is it Ok ? ../../build/Train train --flagsfile chinesetrain.cfg --logtostderr=1 --reportiters=1000 I0417 05:53:06.199527 6381 Train.cpp:59] Reading flags from file chinesetrain.cfg I0417 05:53:06.212417 6381 Train.cpp:148] Gflags after parsing --flagfile=; --fromenv=; --tryfromenv=; --undefok=; --tab_completion_columns=80; --tab_completion_word=; --help=false; --helpfull=false; --helpmatch=; --helpon=; --helppackage=false; --helpshort=false; --helpxml=false; --version=false; --adambeta1=0.90000000000000002; --adambeta2=0.999; --am=; --am_decoder_tr_dropout=0; --am_decoder_tr_layerdrop=0; --am_decoder_tr_layers=1; --arch=network.arch; --archdir=/root/wav2letter/tutorials/1-librispeech_clean/; --attention=content; --attentionthreshold=2147483647; --attnWindow=no; --attnconvchannel=0; --attnconvkernel=0; --attndim=0; --batchsize=4; --beamsize=2500; --beamsizetoken=250000; --beamthreshold=25; --blobdata=false; --channels=1; --criterion=ctc; --critoptim=sgd; --datadir=/home/bjl/data; --dataorder=input; --decoderattnround=1; --decoderdropout=0; --decoderrnnlayer=1; --decodertype=wrd; --devwin=0; --emission_dir=; --emission_queue_size=3000; --enable_distributed=false; --encoderdim=0; --eosscore=0; --eostoken=false; --everstoredb=false; --fftcachesize=1; --filterbanks=40; --flagsfile=chinesetrain.cfg; --framesizems=25; --framestridems=10; --gamma=1; --gumbeltemperature=1; --input=wav; --inputbinsize=100; --inputfeeding=false; --isbeamdump=false; --iter=100; --itersave=false; --labelsmooth=0; --leftWindowSize=50; --lexicon=/home/bjl/data/am/lexicon3.txt; --linlr=-1; --linlrcrit=-1; --linseg=0; --lm=; --lm_memory=5000; --lm_vocab=; --lmtype=kenlm; --lmweight=0; --localnrmlleftctx=0; --localnrmlrightctx=0; --logadd=false; --lr=0.10000000000000001; --lr_decay=9223372036854775807; --lr_decay_step=9223372036854775807; --lrcosine=false; --lrcrit=0; --maxdecoderoutputlen=200; --maxgradnorm=1; --maxisz=9223372036854775807; --maxload=-1; --maxrate=10; --maxsil=50; --maxtsz=9223372036854775807; --maxword=-1; --melfloor=1; --memstepsize=10485760; --mfcc=false; --mfcccoeffs=13; --mfsc=true; --minisz=0; --minrate=3; --minsil=0; --mintsz=0; --momentum=0; --netoptim=sgd; --noresample=false; --nthread=4; --nthread_decoder=1; --nthread_decoder_am_forward=1; --numattnhead=8; --onorm=target; --optimepsilon=1e-08; --optimrho=0.90000000000000002; --outputbinsize=5; --pctteacherforcing=100; --pcttraineval=100; --pow=false; --pretrainWindow=0; --replabel=1; --reportiters=1000; --rightWindowSize=50; --rndv_filepath=; --rundir=/home/bjl/data; --runname=thchtrainlogs; --samplerate=16000; --sampletarget=0; --samplingstrategy=rand; --saug_fmaskf=27; --saug_fmaskn=2; --saug_start_update=-1; --saug_tmaskn=2; --saug_tmaskp=1; --saug_tmaskt=100; --sclite=; --seed=0; --show=false; --showletters=false; --silscore=0; --smearing=none; --smoothingtemperature=1; --softwoffset=10; --softwrate=5; --softwstd=5; --sqnorm=true; --stepsize=9223372036854775807; --surround=|; --tag=; --target=tkn; --test=; --tokens=/home/bjl/data/am/chinesetokens.txt; --tokensdir=; --train=lists/trainlist.lst; --trainWithWindow=false; --transdiag=0; --unkscore=-inf; --use_memcache=false; --use_saug=false; --uselexicon=true; --usewordpiece=false; --valid=lists/devlist.lst; --warmup=8000; --weightdecay=0; --wordscore=0; --wordseparator=|; --world_rank=0; --world_size=1; --alsologtoemail=; --alsologtostderr=false; --colorlogtostderr=false; --drop_log_memory=true; --log_backtrace_at=; --log_dir=; --log_link=; --log_prefix=true; --logbuflevel=0; --logbufsecs=30; --logemaillevel=999; --logfile_mode=436; --logmailer=/bin/mail; --logtostderr=true; --max_log_size=1800; --minloglevel=0; --stderrthreshold=2; --stop_logging_if_full_disk=false; --symbolize_stacktrace=true; --v=0; --vmodule=; I0417 05:53:06.212961 6381 Train.cpp:149] Experiment path: /home/bjl/data/thchtrainlogs I0417 05:53:06.212975 6381 Train.cpp:150] Experiment runidx: 1 I0417 05:53:06.221945 6381 Train.cpp:194] Number of classes (network): 2886 I0417 05:53:06.247638 6381 Train.cpp:201] Number of words: 8874 I0417 05:53:06.253541 6381 Train.cpp:215] Loading architecture file from /root/wav2letter/tutorials/1-librispeech_clean/network.arch I0417 05:53:07.011435 6381 Train.cpp:247] [Network] Sequential [input -> (0) -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> (12) -> (13) -> (14) -> (15) -> (16) -> (17) -> (18) -> (19) -> (20) -> output] (0): View (-1 1 40 0) (1): Conv2D (40->256, 8x1, 2,1, SAME,SAME, 1, 1) (with bias) (2): ReLU (3): Conv2D (256->256, 8x1, 1,1, SAME,SAME, 1, 1) (with bias) (4): ReLU (5): Conv2D (256->256, 8x1, 1,1, SAME,SAME, 1, 1) (with bias) (6): ReLU (7): Conv2D (256->256, 8x1, 1,1, SAME,SAME, 1, 1) (with bias) (8): ReLU (9): Conv2D (256->256, 8x1, 1,1, SAME,SAME, 1, 1) (with bias) (10): ReLU (11): Conv2D (256->256, 8x1, 1,1, SAME,SAME, 1, 1) (with bias) (12): ReLU (13): Conv2D (256->256, 8x1, 1,1, SAME,SAME, 1, 1) (with bias) (14): ReLU (15): Conv2D (256->256, 8x1, 1,1, SAME,SAME, 1, 1) (with bias) (16): ReLU (17): Reorder (2,0,3,1) (18): Linear (256->512) (with bias) (19): ReLU (20): Linear (512->2886) (with bias) I0417 05:53:07.011883 6381 Train.cpp:248] [Network Params: 5366086] I0417 05:53:07.011919 6381 Train.cpp:249] [Criterion] ConnectionistTemporalClassificationCriterion I0417 05:53:07.011962 6381 Train.cpp:257] [Network Optimizer] SGD I0417 05:53:07.011981 6381 Train.cpp:258] [Criterion Optimizer] SGD I0417 05:53:07.165275 6381 W2lListFilesDataset.cpp:141] 8033 files found. I0417 05:53:07.165488 6381 Utils.cpp:102] Filtered 0/8033 samples I0417 05:53:07.165951 6381 W2lListFilesDataset.cpp:62] Total batches (i.e. iters): 2009 I0417 05:53:07.202600 6381 W2lListFilesDataset.cpp:141] 2677 files found. I0417 05:53:07.202692 6381 Utils.cpp:102] Filtered 0/2677 samples I0417 05:53:07.202842 6381 W2lListFilesDataset.cpp:62] Total batches (i.e. iters): 670 I0417 05:53:07.203177 6381 Train.cpp:555] Shuffling trainset I0417 05:53:07.203544 6381 Train.cpp:562] Epoch 1 started! I0417 05:53:22.064838 6381 Train.cpp:737] Finished training

tlikhomanenko commented 4 years ago

You should set --iter=200000 - which means train for 200k updates (which is around 100 epoch in you case). Right now you are training only 100 updates (while to make 1 epoch you need to do 2009 updates I0417 05:53:07.165951 6381 W2lListFilesDataset.cpp:62] Total batches (i.e. iters): 2009)

jkkj1630 commented 4 years ago

你wer还100%，wer是错词率，就是识别结果100%是错误的，肯定不行啊。

bjl21012 commented 4 years ago

你wer还100%，wer是错词率，就是识别结果100%是错误的，肯定不行啊。

我应该怎么调整测试呢，是训练配置有问题，还是样本集的问题？

GhostRider9 commented 4 years ago

你的--iter=25有问题，太太太小了，基本等于没训练所以才导致WER=100% I0417 05:53:07.165488 6381 Utils.cpp:102] Filtered 0/8033 samples 说的是你的训练集内容有8033个 I0417 05:53:07.165951 6381 W2lListFilesDataset.cpp:62] Total batches (i.e. iters): 2009 说的是这8033个分成了2009个batches，因为你的--batchsize=4，8033/4=2009(进一取整) 所以每个epoch需要2009个updates，如果你需要训练10 epoch，那就是2009*10，设置--iter=20090，当然你为了方便设置成20000也是没问题的。

bjl21012 commented 4 years ago

You should set --iter=200000 - which means train for 200k updates (which is around 100 epoch in you case). Right now you are training only 100 updates (while to make 1 epoch you need to do 2009 updates I0417 05:53:07.165951 6381 W2lListFilesDataset.cpp:62] Total batches (i.e. iters): 2009)

I reset train.cfg ，now I can train 100 epochs ， but wer still 100% ， Is there any other problem with the config file ?,thanks.

train.cfg --datadir=/home/bjl/data --rundir=/home/bjl/data --archdir=/root/wav2letter/tutorials/1-librispeech_clean/ --train=lists/trainlist.lst --valid=lists/devlist.lst --input=wav --arch=network.arch --tokens=/home/bjl/data/am/chinesetokens.txt --lexicon=/home/bjl/data/am/lexicon3.txt --criterion=ctc --lr=0.0001 --lrcrit=0.0001 --maxgradnorm=1.0 --replabel=1 --surround=| --onorm=target --sqnorm=true --mfsc=true --filterbanks=40 --nthread=4 --batchsize=4 --runname=thchtrainlogs --iter=200900

train log

xuqiantong commented 4 years ago

Training 基本没什么问题，你需要调整一下你的训练配置。比如现在看来你的learning rate就可能有点小。同时你也可以尝试一些别的模型和optimizer。

bjl21012 commented 4 years ago

Training 基本没什么问题，你需要调整一下你的训练配置。比如现在看来你的learning rate就可能有点小。同时你也可以尝试一些别的模型和optimizer。

我试着把lr 设大一点，超过0.13就会抛异常了，Loss has NaN values. Samples - 000003562,000007945,000002127,000006786 如果用其他模型，有哪些可以推荐的么？

flashlight / wav2letter

chinese train help #613