flashlight / wav2letter

Facebook AI Research's Automatic Speech Recognition Toolkit
https://github.com/facebookresearch/wav2letter/wiki
Other
6.36k stars 1.01k forks source link

Could not start training 1-librispeech_clean as per given instructions #734

Open tumusudheer opened 4 years ago

tumusudheer commented 4 years ago

Hi,

OS: Ubuntu 18.04
Flashlight and Wave2Letter : Compiled from sources (master branch, just pulled today)
Cuda: 10.2 (installed from cuda-repo-ubuntu1804-10-2-local-10.2.89-440.33.01_1.0-1_amd64.deb)
cudnn: 7.6.5

I'm following instructions as given here: https://github.com/facebookresearch/wav2letter/tree/master/tutorials/1-librispeech_clean

After processing the data: I tried running the following command:

build/Train train --flagsfile tutorials/1-librispeech_clean/train.cfg

My train.cfg is as follows:

# Training config for Mini Librispeech
# Replace `[...]` with appropriate paths
--datadir=/data/Self/placeholder/facebook/data/
--rundir=/data/Self/placeholder/facebook/data/run
--archdir=tutorials/1-librispeech_clean/
--train=lists/train-clean-100.lst
--valid=lists/dev-clean.lst
--input=flac
--arch=network.arch
--tokens=/data/Self/placeholder/facebook/data/am/tokens.txt
--lexicon=/data/Self/placeholder/facebook/data/am/lexicon.txt
--criterion=ctc
--lr=0.1
--maxgradnorm=1.0
--replabel=1
--surround=|
--onorm=target
--sqnorm=true
--mfsc=true
--filterbanks=40
--nthread=1
--batchsize=4
--runname=librispeech_clean_trainlogs
--iter=25

I'm getting the following output and the training is not even getting started:

Memory Manager Stats
MemoryManager type: CachingMemoryManager
Number of allocated blocks:20
Size of free block pool (small):16
Size of free block pool (large):9
Total native mallocs:19
Total native frees:0

I tried with lr=0.0001, But still the same issue.

Also tried to run training with command params using the given command:

build/Train train --tokensdir=/data/Self/placeholder/facebook/data/am --tokens=tokens.txt --runname=librispeech -rundir=/data/Self/placeholder/facebook/data/run --archdir=tutorials/1-librispeech_clean/ --arch=network.arch --lexicon=/data/Self/placeholder/facebook/data/am/lexicon.txt --train=/data/Self/placeholder/facebook/data/lists/train-clean-100.lst --valid=/data/Self/placeholder/facebook/data/lists/dev-clean.lst --criterion=ctc --lr=0.0001 --maxgradnorm=1.0 --surround=| --sqnorm=true --mfsc=true --filterbanks=40 --batchsize=4 --iter=25

I'm getting the following error:

--sqnorm=true: command not found
F0705 00:00:16.203802 21785 Train.cpp:616] Loss has NaN values. Samples - train-clean-100-2092-145706-0045
*** Check failure stack trace: ***
    @     0x7f4c0e0f50cd  google::LogMessage::Fail()
    @     0x7f4c0e0f6f33  google::LogMessage::SendToLog()
    @     0x7f4c0e0f4c28  google::LogMessage::Flush()
    @     0x7f4c0e0f7999  google::LogMessageFatal::~LogMessageFatal()
    @     0x558563c5a767  _ZZ4mainENKUlSt10shared_ptrIN2fl6ModuleEES_IN3w2l17SequenceCriterionEES_INS3_10W2lDatasetEES_INS0_19FirstOrderOptimizerEES9_ddblE3_clES2_S5_S7_S9_S9_ddbl
    @     0x558563bf9be8  main
    @     0x7f4c0d1abb97  __libc_start_main
    @     0x558563c53baa  _start

Ran ldd build/Train shows the following:

linux-vdso.so.1 (0x00007fff43782000)
        libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f2c0e3a6000)
        libwarpctc.so => /home/tumu/Self/Research/Work/facebook/wav2letter/wav2letter/build/src/third_party/warpctc/libwarpctc.so (0x00007f2c0dcf3000)
        libgflags.so.2.2 => /usr/lib/x86_64-linux-gnu/libgflags.so.2.2 (0x00007f2c0dace000)
        libfftw3.so.3 => /usr/lib/x86_64-linux-gnu/libfftw3.so.3 (0x00007f2c0d6cc000)
        libmkl_gf_lp64.so => /opt/intel/mkl/lib/intel64/libmkl_gf_lp64.so (0x00007f2c0cb5d000)
        libmkl_sequential.so => /opt/intel/mkl/lib/intel64/libmkl_sequential.so (0x00007f2c0b536000)
        libmkl_core.so => /opt/intel/mkl/lib/intel64/libmkl_core.so (0x00007f2c071b6000)
        libsndfile.so.1 => /usr/lib/x86_64-linux-gnu/libsndfile.so.1 (0x00007f2c06f3d000)
        libflashlight.so => /home/tumu/Self/Research/Work/facebook/flashlight_dist/lib/libflashlight.so (0x00007f2c06a33000)
        libafcuda.so.3 => /home/tumu/Self/Research/Work/facebook/arrayfire/lib/libafcuda.so.3 (0x00007f2beae50000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f2beac31000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f2beaa2d000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f2bea825000)
        libglog.so.0 => /usr/lib/x86_64-linux-gnu/libglog.so.0 (0x00007f2bea5f4000)
        libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f2bea26b000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f2be9ecd000)
        libgomp.so.1 => /usr/lib/x86_64-linux-gnu/libgomp.so.1 (0x00007f2be9c9e000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f2be9a86000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f2be9695000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f2c0eea3000)
        libFLAC.so.8 => /usr/lib/x86_64-linux-gnu/libFLAC.so.8 (0x00007f2be941e000)
        libogg.so.0 => /usr/lib/x86_64-linux-gnu/libogg.so.0 (0x00007f2be9215000)
        libvorbis.so.0 => /usr/lib/x86_64-linux-gnu/libvorbis.so.0 (0x00007f2be8fea000)
        libvorbisenc.so.2 => /usr/lib/x86_64-linux-gnu/libvorbisenc.so.2 (0x00007f2be8d41000)
        libnccl.so.2 => /usr/local/cuda-10.2/lib64/libnccl.so.2 (0x00007f2be4182000)
        libmpi_cxx.so.20 => /usr/lib/x86_64-linux-gnu/libmpi_cxx.so.20 (0x00007f2be3f68000)
        libmpi.so.20 => /usr/lib/x86_64-linux-gnu/libmpi.so.20 (0x00007f2be3c76000)
        libcudnn.so.7 => /usr/local/cuda-10.2/lib64/libcudnn.so.7 (0x00007f2bc838c000)
        libnvrtc.so.10.2 => /usr/local/cuda-10.2/lib64/libnvrtc.so.10.2 (0x00007f2bc6bdf000)
        libcuda.so.1 => /usr/lib/x86_64-linux-gnu/libcuda.so.1 (0x00007f2bc59f7000)
        libunwind.so.8 => /usr/lib/x86_64-linux-gnu/libunwind.so.8 (0x00007f2bc57dc000)
        libopen-rte.so.20 => /usr/lib/x86_64-linux-gnu/libopen-rte.so.20 (0x00007f2bc5554000)
        libopen-pal.so.20 => /usr/lib/x86_64-linux-gnu/libopen-pal.so.20 (0x00007f2bc52a2000)
        libhwloc.so.5 => /usr/lib/x86_64-linux-gnu/libhwloc.so.5 (0x00007f2bc5065000)
        liblzma.so.5 => /lib/x86_64-linux-gnu/liblzma.so.5 (0x00007f2bc4e3f000)
        libutil.so.1 => /lib/x86_64-linux-gnu/libutil.so.1 (0x00007f2bc4c3c000)
        libnuma.so.1 => /usr/lib/x86_64-linux-gnu/libnuma.so.1 (0x00007f2bc4a31000)
        libltdl.so.7 => /usr/lib/x86_64-linux-gnu/libltdl.so.7 (0x00007f2bc4827000)

Am I missing something, or doing anything wrong ? Please suggest how to resolve the issues?

Thank you

tumusudheer commented 4 years ago

I built wav2letter in another one of my ubuntu machine Ubuntu 18.04 cuda 10.1

After I typed this command: ./build/Train train --flagsfile tutorials/1-librispeech_clean/train.cfg

I'm getting the same issues as I mentioned. I get the console print as follows:

Memory Manager Stats
MemoryManager type: CachingMemoryManager
Number of allocated blocks:20
Size of free block pool (small):18
Size of free block pool (large):12
Total native mallocs:19
Total native frees:0

And the the process ends, I don't get any errors and training process doesn't start at all.

My train.cfg looks as follows:

# Training config for Mini Librispeech
# Replace `[...]` with appropriate paths
--datadir=/home/user01/Self/Research/Work/facebook/data
--rundir=/data/user01/facebook/wav2letter/train_runs
--archdir=/data/user01/facebook/wav2letter/tutorials/1-librispeech_clean/
--train=lists/train-clean-100.lst
--valid=lists/dev-clean.lst
--input=flac
--arch=network.arch
--tokens=/home/user01/Self/Research/Work/facebook/data/am/tokens.txt
--lexicon=/home/user01/Self/Research/Work/facebook/data/am/lexicon.txt
--criterion=ctc
--lr=0.1
--maxgradnorm=1.0
--replabel=1
--surround=|
--onorm=target
--sqnorm=true
--mfsc=true
--filterbanks=40
--nthread=4
--batchsize=4
--runname=librispeech_clean_trainlogs
--iter=25

When I looked at my rundir, I saw the following files: subdirectory : librispeech_clean_trainlogs/ contains

total 30512
-rw-rw-r-- 1 user01 user01     4578 Jul  5 13:33 001_config
-rw-rw-r-- 1 user01 user01      428 Jul  5 13:33 001_perf
-rw-rw-r-- 1 user01 user01      460 Jul  5 13:33 001_log
-rw-rw-r-- 1 user01 user01 15610682 Jul  5 13:33 001_model_last.bin
-rw-rw-r-- 1 user01 user01 15610682 Jul  5 13:33 001_model_lists#dev-clean.lst.bin

Contents of 001_perf are

# date  time    epoch   nupdates    lr  lrcriterion runtime bch(ms) smp(ms) fwd(ms) crit-fwd(ms)    bwd(ms) optim(ms)   loss    train-TER   train-WER   lists/dev-clean.lst-loss    lists/dev-clean.lst-TER lists/dev-clean.lst-WER avg-isz avg-tsz max-tsz hrs thrpt(sec/sec)
2020-07-05 13:33:57        1           26 0.100000 0.000000 00:00:04 173.38 4.70 97.43 38.25 24.85 20.42   56.91331 99.95 100.00   40.77565 100.00 100.00 1301 217 299    0.38 300.30

With the following command ./build/Train train --flagsfile tutorials/1-librispeech_clean/train.cfg --logtostderr=1 --minloglevel=0

I get the following output

I0705 15:39:30.304139  8436 Train.cpp:59] Reading flags from file tutorials/1-librispeech_clean/train.cfg
I0705 15:39:30.309223  8436 Train.cpp:151] Gflags after parsing 
--flagfile=; --fromenv=; --tryfromenv=; --undefok=; --tab_completion_columns=80; --tab_completion_word=; --help=false; --helpfull=false; --helpmatch=; --helpon=; --helppackage=false; --helpshort=false; --helpxml=false; --version=false; --adambeta1=0.90000000000000002; --adambeta2=0.999; --am=; --am_decoder_tr_dropout=0; --am_decoder_tr_layerdrop=0; --am_decoder_tr_layers=1; --arch=network.arch; --archdir=/data/user01/Self/facebook/wav2letter/tutorials/1-librispeech_clean/; --attention=content; --attentionthreshold=2147483647; --attnWindow=no; --attnconvchannel=0; --attnconvkernel=0; --attndim=0; --batchsize=4; --beamsize=2500; --beamsizetoken=250000; --beamthreshold=25; --blobdata=false; --channels=1; --criterion=ctc; --critoptim=sgd; --datadir=/home/user01/Suki/Research/Work/Self/facebook/data; --dataorder=input; --decoderattnround=1; --decoderdropout=0; --decoderrnnlayer=1; --decodertype=wrd; --devwin=0; --emission_dir=; --emission_queue_size=3000; --enable_distributed=false; --encoderdim=0; --eosscore=0; --eostoken=false; --everstoredb=false; --fftcachesize=1; --filterbanks=40; --flagsfile=tutorials/1-librispeech_clean/train.cfg; --framesizems=25; --framestridems=10; --gamma=1; --gumbeltemperature=1; --input=flac; --inputbinsize=100; --inputfeeding=false; --isbeamdump=false; --iter=25; --itersave=false; --labelsmooth=0; --leftWindowSize=50; --lexicon=/home/user01/Suki/Research/Work/Self/facebook/data/am/lexicon.txt; --linlr=-1; --linlrcrit=-1; --linseg=0; --lm=; --lm_memory=5000; --lm_vocab=; --lmtype=kenlm; --lmweight=0; --localnrmlleftctx=0; --localnrmlrightctx=0; --logadd=false; --lr=0.10000000000000001; --lr_decay=9223372036854775807; --lr_decay_step=9223372036854775807; --lrcosine=false; --lrcrit=0; --max_devices_per_node=8; --maxdecoderoutputlen=200; --maxgradnorm=1; --maxisz=9223372036854775807; --maxload=-1; --maxrate=10; --maxsil=50; --maxtsz=9223372036854775807; --maxword=-1; --melfloor=1; --memstepsize=10485760; --mfcc=false; --mfcccoeffs=13; --mfsc=true; --minisz=0; --minrate=3; --minsil=0; --mintsz=0; --momentum=0; --netoptim=sgd; --noresample=false; --nthread=4; --nthread_decoder=1; --nthread_decoder_am_forward=1; --numattnhead=8; --onorm=target; --optimepsilon=1e-08; --optimrho=0.90000000000000002; --outputbinsize=5; --pctteacherforcing=100; --pcttraineval=100; --pow=false; --pretrainWindow=0; --replabel=1; --reportiters=0; --rightWindowSize=50; --rndv_filepath=; --rundir=/data/user01/Self/facebook/wav2letter/train_runs; --runname=librispeech_clean_trainlogs; --samplerate=16000; --sampletarget=0; --samplingstrategy=rand; --saug_fmaskf=27; --saug_fmaskn=2; --saug_start_update=-1; --saug_tmaskn=2; --saug_tmaskp=1; --saug_tmaskt=100; --sclite=; --seed=0; --show=false; --showletters=false; --silscore=0; --smearing=none; --smoothingtemperature=1; --softwoffset=10; --softwrate=5; --softwstd=5; --sqnorm=true; --stepsize=9223372036854775807; --surround=|; --tag=; --target=tkn; --test=; --tokens=/home/user01/Suki/Research/Work/Self/facebook/data/am/tokens.txt; --tokensdir=; --train=lists/train-clean-100.lst; --trainWithWindow=false; --transdiag=0; --unkscore=-inf; --use_memcache=false; --uselexicon=true; --usewordpiece=false; --valid=lists/dev-clean.lst; --warmup=1; --weightdecay=0; --wordscore=0; --wordseparator=|; --world_rank=0; --world_size=1; --alsologtoemail=; --alsologtostderr=false; --colorlogtostderr=false; --drop_log_memory=true; --log_backtrace_at=; --log_dir=; --log_link=; --log_prefix=true; --logbuflevel=0; --logbufsecs=30; --logemaillevel=999; --logfile_mode=436; --logmailer=/bin/mail; --logtostderr=true; --max_log_size=1800; --minloglevel=0; --stderrthreshold=2; --stop_logging_if_full_disk=false; --symbolize_stacktrace=true; --v=0; --vmodule=; 
I0705 15:39:30.309458  8436 Train.cpp:152] Experiment path: /data/user01/Self/facebook/wav2letter/train_runs/librispeech_clean_trainlogs
I0705 15:39:30.309460  8436 Train.cpp:153] Experiment runidx: 1
I0705 15:39:30.309729  8436 Train.cpp:199] Number of classes (network): 30
I0705 15:39:30.341761  8436 Train.cpp:206] Number of words: 34795
I0705 15:39:30.350679  8436 Train.cpp:220] Loading architecture file from /data/user01/Self/facebook/wav2letter/tutorials/1-librispeech_clean/network.arch
I0705 15:39:30.885213  8436 Train.cpp:252] [Network] Sequential [input -> (0) -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> (12) -> (13) -> (14) -> (15) -> (16) -> (17) -> (18) -> (19) -> (20) -> output]
    (0): View (-1 1 40 0)
    (1): Conv2D (40->256, 8x1, 2,1, SAME,SAME, 1, 1) (with bias)
    (2): ReLU
    (3): Conv2D (256->256, 8x1, 1,1, SAME,SAME, 1, 1) (with bias)
    (4): ReLU
    (5): Conv2D (256->256, 8x1, 1,1, SAME,SAME, 1, 1) (with bias)
    (6): ReLU
    (7): Conv2D (256->256, 8x1, 1,1, SAME,SAME, 1, 1) (with bias)
    (8): ReLU
    (9): Conv2D (256->256, 8x1, 1,1, SAME,SAME, 1, 1) (with bias)
    (10): ReLU
    (11): Conv2D (256->256, 8x1, 1,1, SAME,SAME, 1, 1) (with bias)
    (12): ReLU
    (13): Conv2D (256->256, 8x1, 1,1, SAME,SAME, 1, 1) (with bias)
    (14): ReLU
    (15): Conv2D (256->256, 8x1, 1,1, SAME,SAME, 1, 1) (with bias)
    (16): ReLU
    (17): Reorder (2,0,3,1)
    (18): Linear (256->512) (with bias)
    (19): ReLU
    (20): Linear (512->30) (with bias)
I0705 15:39:30.885252  8436 Train.cpp:253] [Network Params: 3900958]
I0705 15:39:30.885257  8436 Train.cpp:254] [Criterion] ConnectionistTemporalClassificationCriterion
I0705 15:39:30.885267  8436 Train.cpp:262] [Network Optimizer] SGD
I0705 15:39:30.885269  8436 Train.cpp:263] [Criterion Optimizer] SGD
I0705 15:39:31.180352  8436 W2lListFilesDataset.cpp:141] 28539 files found. 
I0705 15:39:31.180716  8436 Utils.cpp:102] Filtered 0/28539 samples
I0705 15:39:31.182535  8436 W2lListFilesDataset.cpp:62] Total batches (i.e. iters): 7135
I0705 15:39:31.215492  8436 W2lListFilesDataset.cpp:141] 2703 files found. 
I0705 15:39:31.215519  8436 Utils.cpp:102] Filtered 0/2703 samples
I0705 15:39:31.215649  8436 W2lListFilesDataset.cpp:62] Total batches (i.e. iters): 676
I0705 15:39:31.216138  8436 Train.cpp:564] Shuffling trainset
I0705 15:39:31.216651  8436 Train.cpp:571] Epoch 1 started!
I0705 15:39:45.632812  8436 Train.cpp:345] epoch:        1 | nupdates:           26 | lr: 0.100000 | lrcriterion: 0.000000 | runtime: 00:00:05 | bch(ms): 204.63 | smp(ms): 5.18 | fwd(ms): 127.09 | crit-fwd(ms): 36.39 | bwd(ms): 25.18 | optim(ms): 20.58 | loss:   56.91273 | train-TER: 99.95 | train-WER: 100.00 | lists/dev-clean.lst-loss:   40.77376 | lists/dev-clean.lst-TER: 100.00 | lists/dev-clean.lst-WER: 100.00 | avg-isz: 1301 | avg-tsz: 217 | max-tsz: 299 | hrs:    0.38 | thrpt(sec/sec): 254.44
Memory Manager Stats
MemoryManager type: CachingMemoryManager
Number of allocated blocks:20
Size of free block pool (small):18
Size of free block pool (large):12
Total native mallocs:19
Total native frees:0
I0705 15:39:45.666921  8436 Train.cpp:747] Finished training

Also I tried to run SOTA 2019 librispeech train_am_tds_s2s.cfg I prepared the data as using this command python3 prepare.py --dst with dst as some data directory.

And my train.cfg is as follows:

# Replace `[...]`, `[MODEL_DST]`, `[DATA_DST]`, with appropriate paths
--runname=am_tds_s2s_librispeech
--rundir=/data/user01/Self/facebook/wav2letter/data/sota/run/
--archdir=/data/user01/Self/facebook/wav2letter/recipes/models/sota/2019/
--arch=am_arch/am_tds_s2s.arch
--tokensdir=/data/user01/Self/facebook/wav2letter/data/sota/models/am
--tokens=librispeech-train-all-unigram-10000.tokens
--lexicon=/data/user01/Self/facebook/wav2letter/data/sota/models/am/librispeech-train+dev-unigram-10000-nbest10.lexicon
--train=/data/user01/Self/facebook/wav2letter/data/sota/data/lists/train-clean-100.lst,/data/user01/Self/facebook/wav2letter/data/sota/data/lists/train-clean-360.lst,/data/user01/Self/facebook/wav2letter/data/sota/data/lists/train-other-500.lst
--valid=dev-clean:/data/user01/Self/facebook/wav2letter/data/sota/data/lists/dev-clean.lst,dev-other:/data/user01/Self/facebook/wav2letter/data/sota/data/lists/dev-other.lst
--batchsize=2
--lr=0.06
--lrcrit=0.06
--momentum=0.5
--maxgradnorm=15
--mfsc=true
--nthread=6
--criterion=seq2seq
--maxdecoderoutputlen=120
--labelsmooth=0.05
--dataorder=output_spiral
--inputbinsize=25
--attnWindow=softPretrain
--softwstd=4
--trainWithWindow=true
--pretrainWindow=3
--attention=keyvalue
--encoderdim=512
--memstepsize=8338608
--eostoken=true
--pcttraineval=1
--pctteacherforcing=99
--listdata=true
--usewordpiece=true
--wordseparator=_
--target=ltr
--filterbanks=80
--stepsize=150
--gamma=0.5
--sampletarget=0.01
--enable_distributed=true
--iter=600
--framesizems=30
--framestridems=10
--decoderdropout=0.1
--decoderattnround=2
--decoderrnnlayer=3
--seed=2

After 2 epochs, the training is getting stopped and seeing the following the prints

Memory Manager Stats
MemoryManager type: CachingMemoryManager
Number of allocated blocks:455
Size of free block pool (small):111
Size of free block pool (large):138
Total native mallocs:242
Total native frees:0
I0705 17:22:56.667855 17991 Train.cpp:747] Finished training

Training command

./build/Train train --flagsfile recipes/models/sota/2019/librispeech/train_am_tds_s2s.cfg --logtostderr=1 --minloglevel=0 Complete log of SOTA training is attached here sota_tds_s2s.log

tlikhomanenko commented 4 years ago

Hi,

This is about memory manager, just info for debugging, you can skip this if you try to understand what is happening with training.

Memory Manager Stats
MemoryManager type: CachingMemoryManager
Number of allocated blocks:20
Size of free block pool (small):18
Size of free block pool (large):12
Total native mallocs:19
Total native frees:0

To see the log of epoch training use --logtostderr=1 when you run training. --iter=25 means 25 updates, not 25 epochs, so set this value, say to 1000000 for longer training.

tumusudheer commented 4 years ago

Hi @tlikhomanenko,

Thank you very much. I was able to train using the directions given here . I trained for 50 epochs and got the WER ~ 22 after 50 epochs. But it was mentioned We got a WER of 18.96 on test-clean! in that page.

Is that WER after 25 epochs ? Or may I know after how many epochs the WER became 18.9 ?

Also I observer the model .bin gets updated for every epoch. Is there a way I can save model .bin file and other files for every epoch so during testing I can run inference with multiple models and see which one is better?

Is there a documentation/page explaining about various parameters using in train/decode configuration files ?

Thank you

tlikhomanenko commented 4 years ago

Also I observer the model .bin gets updated for every epoch. Is there a way I can save model .bin file and other files for every epoch so during testing I can run inference with multiple models and see which one is better?

There is option https://github.com/facebookresearch/wav2letter/blob/master/src/common/Defines.cpp#L79 for this, to save each update. Here https://github.com/facebookresearch/wav2letter/blob/master/Train.cpp#L358 you can add condition you want to save (if you don't need each iteration), for example change this row to if (FLAGS_itersave && iter % 100 == 0) { to save every 100 updates. Also you have this option https://github.com/facebookresearch/wav2letter/blob/master/src/common/Defines.cpp#L159 to specify when run validation and best model is saved always, so you don't need to eval every snapshot, just specify this flag and the best model will be saved during training.

Is there a documentation/page explaining about various parameters using in train/decode configuration files ?

All flags you can find here https://github.com/facebookresearch/wav2letter/blob/master/src/common/Defines.cpp. For the decoder we have extensive documentation here https://github.com/facebookresearch/wav2letter/wiki/Beam-Search-Decoder

Thank you very much. I was able to train using the directions given here . I trained for 50 epochs and got the WER ~ 22 after 50 epochs. But it was mentioned We got a WER of 18.96 on test-clean! in that page.

The WER of 18.96 is obtained on test-clean (not dev-clean) and this is after decoder, so LM is incorporated here that is why results will be much better compared to Viterbi WER which is reported in the training log. Could you try to run decoder and check what you have with it?

tumusudheer commented 4 years ago

Hi @tlikhomanenko ,

Thank you very much for your help

This is the result of my decoder step: [Decode lists/test-clean.lst (2620 samples) in 126.635s (actual decoding time 0.191s/sample) -- WER: 22.1242, LER: 10.3914]

The decoder command is: ./build/Decoder --flagsfile tutorials/1-librispeech_clean/decode.cfg

And my decode.cfg

# Decoding config for Mini Librispeech
# Replace `[...]` with appropriate paths
--lexicon=/data/Self/user01/facebook/data/lm/lexicon.txt
--lm=/data/Self/user01/facebook/data/lm/3-gram.arpa
--am=/data/Self/user01/facebook/data/run/librispeech_clean_trainlogs/001_model_lists#dev-clean.lst.bin
--test=lists/test-clean.lst
--sclite=/data/Self/user01/facebook/data/decode_logs
--lmweight=2.5
--wordscore=1
--beamsize=500
--beamthreshold=25
--silweight=-0.5
--nthread_decoder=4
--smearing=max
--show=true

Also attached my decoder logs from (/data/Self/user01/facebook/data/decode_logs)

Thank you

decode_logs.tar.gz

tlikhomanenko commented 4 years ago

What is your Viterbi WER in the training log on dev-clean?

@vineelpratap any idea on disagreement with tutorial numbers?

tumusudheer commented 4 years ago

Hi @tlikhomanenko ,

Thank you very much.

My TER and WER of train data after 49th epoch are 3.77 17.02 respectively. Similarly TER and WER of (lists/dev-clean.lst-TER lists/dev-clean.lst-WER after 49th epoch ) are 18.17 52.11.

Please find the training logs attached here as well. train_logs.tar.gz

tlikhomanenko commented 4 years ago

Ok, probably this 3-4% WER is just variation of training and that decoder parameters will be not exactly the same for you model. At least improvement from 52% Viterbi to 22% looks reasonable for me on the dev-clean.

rajeevbaalwan commented 4 years ago

Hi @tlikhomanenko ,

Thank you very much for your help

This is the result of my decoder step: [Decode lists/test-clean.lst (2620 samples) in 126.635s (actual decoding time 0.191s/sample) -- WER: 22.1242, LER: 10.3914]

The decoder command is: ./build/Decoder --flagsfile tutorials/1-librispeech_clean/decode.cfg

And my decode.cfg

# Decoding config for Mini Librispeech
# Replace `[...]` with appropriate paths
--lexicon=/data/Self/user01/facebook/data/lm/lexicon.txt
--lm=/data/Self/user01/facebook/data/lm/3-gram.arpa
--am=/data/Self/user01/facebook/data/run/librispeech_clean_trainlogs/001_model_lists#dev-clean.lst.bin
--test=lists/test-clean.lst
--sclite=/data/Self/user01/facebook/data/decode_logs
--lmweight=2.5
--wordscore=1
--beamsize=500
--beamthreshold=25
--silweight=-0.5
--nthread_decoder=4
--smearing=max
--show=true

Also attached my decoder logs from (/data/Self/user01/facebook/data/decode_logs)

Thank you

decode_logs.tar.gz

Hi @tumusudheer On which data did you trained on full librispeech or just 100 hour subset of librispeech to get WER 22 and LER 10 on test-clean ?

With training the tutorial model on 100 hour librispeech with provided training config I am getting WER 48.4 and TER 16.9 on test clean after decoding with 4 gram kenlm language model of full librispeech. How did you get from 52% WER to 22%. The 3 gram lm you used in decoding is build on full librispeech or just 100 Hr subset + test-clean ?

mironnn commented 4 years ago

Hi,

This is about memory manager, just info for debugging, you can skip this if you try to understand what is happening with training.

Memory Manager Stats
MemoryManager type: CachingMemoryManager
Number of allocated blocks:20
Size of free block pool (small):18
Size of free block pool (large):12
Total native mallocs:19
Total native frees:0

To see the log of epoch training use --logtostderr=1 when you run training. --iter=25 means 25 updates, not 25 epochs, so set this value, say to 1000000 for longer training.

Could you please advice me? I also try to run tutorial and always have only one epoch (even with 1000 iteration) and error at the end.

I0715 19:16:15.572331 413 W2lListFilesDataset.cpp:141] 2703 files found. I0715 19:16:15.572844 413 Utils.cpp:102] Filtered 0/2703 samples I0715 19:16:15.573438 413 W2lListFilesDataset.cpp:62] Total batches (i.e. iters): 676 I0715 19:16:15.575392 413 Train.cpp:566] Shuffling trainset I0715 19:16:15.577626 413 Train.cpp:573] Epoch 1 started! I0715 19:31:17.412422 413 Train.cpp:345] epoch: 1 | nupdates: 1001 | lr: 0.012513 | lrcriterion: 0.000000 | runtime: 00:12:28 | bch(ms): 748.20 | smp(ms): 0.54 | fwd(ms): 281.85 | crit-fwd(ms): 25.53 | bwd(ms): 426.05 | optim(ms): 36.68 | loss: 51.37616 | train-TER: 99.13 | train-WER: 100.00 | lists/dev-clean.lst-loss: 28.94163 | lists/dev-clean.lst-TER: 100.00 | lists/dev-clean.lst-WER: 100.00 | avg-isz: 1243 | avg-tsz: 208 | max-tsz: 317 | hrs: 13.83 | thrpt(sec/sec): 66.50 Memory Manager Stats MemoryManager type: CachingMemoryManager Number of allocated blocks:26 Size of free block pool (small):18 Size of free block pool (large):11 Total native mallocs:18 Total native frees:0 I0715 19:31:17.476619 413 Train.cpp:748] Finished training pure virtual method called terminate called without an active exception Aborted at 1594841477 (unix time) try "date -d @1594841477" if you are using GNU date PC: @ 0x7f61b5835e97 gsignal SIGABRT (@0x19d) received by PID 413 (TID 0x7f61c106c800) from PID 413; stack trace: @ 0x7f61b676a890 (unknown) @ 0x7f61b5835e97 gsignal @ 0x7f61b5837801 abort @ 0x7f61b622a957 (unknown) @ 0x7f61b6230ae6 (unknown) @ 0x7f61b6230b21 std::terminate() @ 0x7f61b62318ff cxa_pure_virtual @ 0x560ba8718378 (unknown) @ 0x7f61bf21fd30 MemoryManagerFunctionWrapper::unlock() @ 0x560ba8489796 (unknown) @ 0x7f61be4035f8 cpu::destroyArray<>() @ 0x7f61befbfa85 af_release_array @ 0x7f61b583a041 (unknown) @ 0x7f61b583a13a exit @ 0x7f61b5818b9e libc_start_main @ 0x560ba846ab9a (unknown) Aborted

tumusudheer commented 4 years ago

Hi @tumusudheer On which data did you trained on full librispeech or just 100 hour subset of librispeech to get WER 22 and LER 10 on test-clean ?

With training the tutorial model on 100 hour librispeech with provided training config I am getting WER 48.4 and TER 16.9 on test clean after decoding with 4 gram kenlm language model of full librispeech. How did you get from 52% WER to 22%. The 3 gram lm you used in decoding is build on full librispeech or just 100 Hr subset + test-clean ?

Hi @rajeevbaalwan ,

I trained on 100 hour subset of librispeech train clean set. My training configuration file is:

# Training config for Mini Librispeech
# Replace `[...]` with appropriate paths
--datadir=/home/user01/Self/Research/Work/facebook/data
--rundir=/data/user01/facebook/wav2letter/train_runs
--archdir=/data/user01/facebook/wav2letter/tutorials/1-librispeech_clean/
--train=lists/train-clean-100.lst
--valid=lists/dev-clean.lst
--input=flac
--arch=network.arch
--tokens=/home/user01/Self/Research/Work/facebook/data/am/tokens.txt
--lexicon=/home/user01/Self/Research/Work/facebook/data/am/lexicon.txt
--criterion=ctc
--lr=0.1
--maxgradnorm=1.0
--replabel=1
--surround=|
--onorm=target
--sqnorm=true
--mfsc=true
--filterbanks=40
--nthread=4
--batchsize=4
--runname=librispeech_clean_trainlogs
--iter=1000000

The 3-gram arpa is downloaded from here I ran training for 49 epochs.

How many epochs of training did you run ? Can you post your training and decoding configuration files so we can look at the parameters and see if anything is off.

tlikhomanenko commented 4 years ago

Hi, This is about memory manager, just info for debugging, you can skip this if you try to understand what is happening with training.

Memory Manager Stats
MemoryManager type: CachingMemoryManager
Number of allocated blocks:20
Size of free block pool (small):18
Size of free block pool (large):12
Total native mallocs:19
Total native frees:0

To see the log of epoch training use --logtostderr=1 when you run training. --iter=25 means 25 updates, not 25 epochs, so set this value, say to 1000000 for longer training.

Could you please advice me? I also try to run tutorial and always have only one epoch (even with 1000 iteration) and error at the end.

I0715 19:16:15.572331 413 W2lListFilesDataset.cpp:141] 2703 files found. I0715 19:16:15.572844 413 Utils.cpp:102] Filtered 0/2703 samples I0715 19:16:15.573438 413 W2lListFilesDataset.cpp:62] Total batches (i.e. iters): 676 I0715 19:16:15.575392 413 Train.cpp:566] Shuffling trainset I0715 19:16:15.577626 413 Train.cpp:573] Epoch 1 started! I0715 19:31:17.412422 413 Train.cpp:345] epoch: 1 | nupdates: 1001 | lr: 0.012513 | lrcriterion: 0.000000 | runtime: 00:12:28 | bch(ms): 748.20 | smp(ms): 0.54 | fwd(ms): 281.85 | crit-fwd(ms): 25.53 | bwd(ms): 426.05 | optim(ms): 36.68 | loss: 51.37616 | train-TER: 99.13 | train-WER: 100.00 | lists/dev-clean.lst-loss: 28.94163 | lists/dev-clean.lst-TER: 100.00 | lists/dev-clean.lst-WER: 100.00 | avg-isz: 1243 | avg-tsz: 208 | max-tsz: 317 | hrs: 13.83 | thrpt(sec/sec): 66.50 Memory Manager Stats MemoryManager type: CachingMemoryManager Number of allocated blocks:26 Size of free block pool (small):18 Size of free block pool (large):11 Total native mallocs:18 Total native frees:0 I0715 19:31:17.476619 413 Train.cpp:748] Finished training pure virtual method called terminate called without an active exception Aborted at 1594841477 (unix time) try "date -d @1594841477" if you are using GNU date PC: @ 0x7f61b5835e97 gsignal SIGABRT (@0x19d) received by PID 413 (TID 0x7f61c106c800) from PID 413; stack trace: @ 0x7f61b676a890 (unknown) @ 0x7f61b5835e97 gsignal @ 0x7f61b5837801 abort @ 0x7f61b622a957 (unknown) @ 0x7f61b6230ae6 (unknown) @ 0x7f61b6230b21 std::terminate() @ 0x7f61b62318ff cxa_pure_virtual @ 0x560ba8718378 (unknown) @ 0x7f61bf21fd30 MemoryManagerFunctionWrapper::unlock() @ 0x560ba8489796 (unknown) @ 0x7f61be4035f8 cpu::destroyArray<>() @ 0x7f61befbfa85 af_release_array @ 0x7f61b583a041 (unknown) @ 0x7f61b583a13a exit @ 0x7f61b5818b9e libc_start_main @ 0x560ba846ab9a (unknown) Aborted

use --iter=100000000 as I pointed before we changed this parameter to be number of updates, not number of epochs.

mironnn commented 4 years ago

Hi, This is about memory manager, just info for debugging, you can skip this if you try to understand what is happening with training.

Memory Manager Stats
MemoryManager type: CachingMemoryManager
Number of allocated blocks:20
Size of free block pool (small):18
Size of free block pool (large):12
Total native mallocs:19
Total native frees:0

To see the log of epoch training use --logtostderr=1 when you run training. --iter=25 means 25 updates, not 25 epochs, so set this value, say to 1000000 for longer training.

Could you please advice me? I also try to run tutorial and always have only one epoch (even with 1000 iteration) and error at the end. I0715 19:16:15.572331 413 W2lListFilesDataset.cpp:141] 2703 files found. I0715 19:16:15.572844 413 Utils.cpp:102] Filtered 0/2703 samples I0715 19:16:15.573438 413 W2lListFilesDataset.cpp:62] Total batches (i.e. iters): 676 I0715 19:16:15.575392 413 Train.cpp:566] Shuffling trainset I0715 19:16:15.577626 413 Train.cpp:573] Epoch 1 started! I0715 19:31:17.412422 413 Train.cpp:345] epoch: 1 | nupdates: 1001 | lr: 0.012513 | lrcriterion: 0.000000 | runtime: 00:12:28 | bch(ms): 748.20 | smp(ms): 0.54 | fwd(ms): 281.85 | crit-fwd(ms): 25.53 | bwd(ms): 426.05 | optim(ms): 36.68 | loss: 51.37616 | train-TER: 99.13 | train-WER: 100.00 | lists/dev-clean.lst-loss: 28.94163 | lists/dev-clean.lst-TER: 100.00 | lists/dev-clean.lst-WER: 100.00 | avg-isz: 1243 | avg-tsz: 208 | max-tsz: 317 | hrs: 13.83 | thrpt(sec/sec): 66.50 Memory Manager Stats MemoryManager type: CachingMemoryManager Number of allocated blocks:26 Size of free block pool (small):18 Size of free block pool (large):11 Total native mallocs:18 Total native frees:0 I0715 19:31:17.476619 413 Train.cpp:748] Finished training pure virtual method called terminate called without an active exception Aborted at 1594841477 (unix time) try "date -d @1594841477" if you are using GNU date PC: @ 0x7f61b5835e97 gsignal SIGABRT (@0x19d) received by PID 413 (TID 0x7f61c106c800) from PID 413; stack trace: @ 0x7f61b676a890 (unknown) @ 0x7f61b5835e97 gsignal @ 0x7f61b5837801 abort @ 0x7f61b622a957 (unknown) @ 0x7f61b6230ae6 (unknown) @ 0x7f61b6230b21 std::terminate() @ 0x7f61b62318ff cxa_pure_virtual @ 0x560ba8718378 (unknown) @ 0x7f61bf21fd30 MemoryManagerFunctionWrapper::unlock() @ 0x560ba8489796 (unknown) @ 0x7f61be4035f8 cpu::destroyArray<>() @ 0x7f61befbfa85 af_release_array @ 0x7f61b583a041 (unknown) @ 0x7f61b583a13a exit @ 0x7f61b5818b9e libc_start_main @ 0x560ba846ab9a (unknown) Aborted

use --iter=100000000 as I pointed before we changed this parameter to be number of updates, not number of epochs.

Yes, thank you. This helped. Could you please advice how to calculate how many iterations I need for a certain number of epics? And it's a pity but I have only WE 50%. and it was at 15-10 epics. Further training have not improved the result

tlikhomanenko commented 4 years ago

WER 50% is expected with very small model we have in tutorial. Check recipes/models/sota for our best results.

About converting epoch into iterations - just compute how many rows you have in the train list and divide it into total batch size = number of GPUs you are using * batchsize parameter - this will be number of iterations in one epoch.

mironnn commented 4 years ago

WER 50% is expected with very small model we have in tutorial. Check recipes/models/sota for our best results.

About converting epoch into iterations - just compute how many rows you have in the train list and divide it into total batch size = number of GPUs you are using * batchsize parameter - this will be number of iterations in one epoch.

Thank you @tlikhomanenko for your help! Sorry for the stupid question, but how can I run the pre-trained model from recipes/models/sota? Could I use multithreaded_streaming_asr_example from Inference tutorial. But I think that model from sota and model for multithreaded_streaming_asr_example has different format, or Not (because files are defferent. In sota model there is no feature_extractor.bin for example)?

tlikhomanenko commented 4 years ago

@mironnn

they have different format, you need to convert the sota models, right now we support it only for TDS models. Please follow instructions on TDS converter in wav2letter/tools.

iggygeek commented 3 years ago

Hi @tlikhomanenko ,

Thank you very much for your help

This is the result of my decoder step: [Decode lists/test-clean.lst (2620 samples) in 126.635s (actual decoding time 0.191s/sample) -- WER: 22.1242, LER: 10.3914]

The decoder command is: ./build/Decoder --flagsfile tutorials/1-librispeech_clean/decode.cfg

And my decode.cfg

# Decoding config for Mini Librispeech
# Replace `[...]` with appropriate paths
--lexicon=/data/Self/user01/facebook/data/lm/lexicon.txt
--lm=/data/Self/user01/facebook/data/lm/3-gram.arpa
--am=/data/Self/user01/facebook/data/run/librispeech_clean_trainlogs/001_model_lists#dev-clean.lst.bin
--test=lists/test-clean.lst
--sclite=/data/Self/user01/facebook/data/decode_logs
--lmweight=2.5
--wordscore=1
--beamsize=500
--beamthreshold=25
--silweight=-0.5
--nthread_decoder=4
--smearing=max
--show=true

Also attached my decoder logs from (/data/Self/user01/facebook/data/decode_logs)

Thank you

decode_logs.tar.gz

I also followed the tutorial step by step and obtained : [Decode lists/test-clean.lst (2620 samples) in 130.133s (actual decoding time 0.195s/sample) -- WER: 23.1874, LER: 10.8369]

The best trained model has log : [# date time epoch nupdates lr lrcriterion runtime bch(ms) smp(ms) fwd(ms) crit-fwd(ms) bwd(ms) optim(ms) loss train-TER train-WER lists/dev-clean.lst-loss lists/dev-clean.lst-TER lists/dev-clean.lst-WER avg-isz avg-tsz max-tsz hrs thrpt(sec/sec)] [2020-08-27 20:54:06 53 378155 0.050000 0.000000 00:05:51 49.23 2.12 30.79 26.78 10.62 1.74 2.79191 7.18 28.29 7.38088 18.93 53.85 1267 213 400 100.47 1029.75]

Is it really normal to have 4% of difference with the reported result in the tutorial (18.96) ??