flashlight / wav2letter

Facebook AI Research's Automatic Speech Recognition Toolkit
https://github.com/facebookresearch/wav2letter/wiki
Other
6.39k stars 1.01k forks source link

data preparation #340

Closed SY-nc closed 5 years ago

SY-nc commented 5 years ago

I tried preparing data using the script recipes/librispeech/data/prepare_data.py and for some reason, it was only writing the .lst files and not the other files.

After spending some time trying to figure out the problem, I manually placed one audio file and its corresponding id,tkn,wrd files in a folder, set the path to that folder in recipes/librispeech/configs/conv_glu/train.cfg.

Now when I tried running training, it says 'found 0 files' and starts running epochs. And all the subsequent outputs were zero, like this:

I0701 16:56:51.690100  1646 Train.cpp:249] [Criterion Optimizer] SGD (for first 1 epochs)
I0701 16:56:51.749213  1646 W2lListFilesDataset.cpp:137] 0 files found. 
I0701 16:56:51.749238  1646 Utils.cpp:102] Filtered 0/0 samples
I0701 16:56:51.749248  1646 W2lListFilesDataset.cpp:62] Total batches (i.e. iters): 0
I0701 16:56:51.810676  1646 W2lListFilesDataset.cpp:137] 0 files found. 
I0701 16:56:51.810699  1646 Utils.cpp:102] Filtered 0/0 samples
I0701 16:56:51.810705  1646 W2lListFilesDataset.cpp:62] Total batches (i.e. iters): 0
I0701 16:56:51.810765  1646 Train.cpp:493] Shuffling trainset
I0701 16:56:51.810791  1646 Train.cpp:500] Epoch 1 started!
I0701 16:56:51.810850  1646 Train.cpp:296] epoch:        1 | lr: 0.600000 | lrcriterion: 0.006000 | runtime: 00:00:00 | bch(ms): 0.00 | smp(ms): 0.00 | fwd(ms): 0.00 | crit-fwd(ms): 0.00 | bwd(ms): 0.00 | optim(ms): 0.00 | loss:    0.00000 | train-TER:  0.00 | train-WER:  0.00 | data/dev-clean-loss:    0.00000 | data/dev-clean-TER:  0.00 | data/dev-clean-WER:  0.00 | avg-isz: 000 | avg-tsz: 000 | max-tsz: 000 | hrs:    0.00 | thrpt(sec/sec): n/a
I0701 16:56:52.031046  1646 Train.cpp:607] Finished LinSeg
I0701 16:56:52.031132  1646 Train.cpp:493] Shuffling trainset
I0701 16:56:52.031142  1646 Train.cpp:500] Epoch 2 started!
I0701 16:56:52.031198  1646 Train.cpp:296] epoch:        2 | lr: 0.600000 | lrcriterion: 0.006000 | runtime: 00:00:00 | bch(ms): 0.00 | smp(ms): 0.00 | fwd(ms): 0.00 | crit-fwd(ms): 0.00 | bwd(ms): 0.00 | optim(ms): 0.00 | loss:    0.00000 | train-TER:  0.00 | train-WER:  0.00 | data/dev-clean-loss:    0.00000 | data/dev-clean-TER:  0.00 | data/dev-clean-WER:  0.00 | avg-isz: 000 | avg-tsz: 000 | max-tsz: 000 | hrs:    0.00 | thrpt(sec/sec): n/a
I0701 16:56:52.256551  1646 Train.cpp:493] Shuffling trainset
I0701 16:56:52.256577  1646 Train.cpp:500] Epoch 3 started!

Here is my flags file:

# Training config for Librispeech using Gated ConvNets
# Replace `[...]` with appropriate paths
--runname=librispeech_conv_glu
--rundir=/home/satishyadav/speech/mytest/big/
--datadir=/home/satishyadav/speech/mytest/big/
--tokensdir=/home/satishyadav/speech/mytest/big/
--archdir=/home/satishyadav/speech/wav2letter/recipes/librispeech/configs/conv_glu/
--listdata=true
--train=data/train-clean-100
--valid=data/dev-clean
--lexicon=/home/satishyadav/speech/mytest/big/lm/lexicon.txt
--input=wav
--arch=network-tut.arch
--tokens=data/tokens.txt
--criterion=asg
--lr=0.6
--lrcrit=0.006
--linseg=1
--momentum=0.8
--maxgradnorm=0.2
--replabel=2
--surround=|
--onorm=target
--sqnorm=true
--mfsc=true
--nthread=6
--batchsize=1
--transdiag=4

Any help where I'm going wrong? And please do tell me the possible reasons for incorrect data preparation from the script.

SY-nc commented 5 years ago

After commenting the --listdata=true flag, it was able to detect the files.

SY-nc commented 5 years ago

I'm still struggling with data preparation.

I tried the script for one audio file. After preparing data, all the .lst files are created and the contents of the lst file are:

train-clean-100-1001 /home/satishyadav/speech/mytest/tut/used/batch-123/train-clean-100/1001.wav 8177.375 a quick brown fox jumps over the lazy dog

The first line has my file name and path but the number '8177.375' is completely random and doesn't have to do anything with my file.

When I tried training, I get the following error:

I0702 12:47:51.835745  5880 Train.cpp:166] Number of classes (network): 30
terminate called after throwing an instance of 'std::runtime_error'
  what():  [Loading words] Invalid line:  
*** Aborted at 1562051871 (unix time) try "date -d @1562051871" if you are using GNU date ***
PC: @     0x7efc0ed87428 gsignal
*** SIGABRT (@0x16f8) received by PID 5880 (TID 0x7efc382ec5c0) from PID 5880; stack trace: ***
    @     0x7efc16b44390 (unknown)
    @     0x7efc0ed87428 gsignal
    @     0x7efc0ed8902a abort
    @     0x7efc0f8ec84d __gnu_cxx::__verbose_terminate_handler()
    @     0x7efc0f8ea6b6 (unknown)
    @     0x7efc0f8ea701 std::terminate()
    @     0x7efc0f8ea919 __cxa_throw
    @           0x567ecf w2l::loadWords()
    @           0x4197d3 main
    @     0x7efc0ed72830 __libc_start_main
    @           0x4792a9 _start
Aborted (core dumped)
tlikhomanenko commented 5 years ago

Hi @SYnchronYSe,

What do you mean by "'8177.375' is completely random"? this number should be the size of wav file (number of frames).

Could you also publish the full log what you have?

tlikhomanenko commented 5 years ago

Seems that you provided not correct lexicon file. How does it look like?

SY-nc commented 5 years ago

Hi @SYnchronYSe,

What do you mean by "'8177.375' is completely random"? this number should be the size of wav file (number of frames).

Could you also publish the full log what you have?

Sorry I couldn't figure it out. It seemed random to me at the first glance.

Here's the full log of the process:

~/speech$ sudo wav2letter/recipes/librispeech/data/prepare_data.py --src mytest/tut/used/batch-123/ --dst mytest/recip/
[sudo] password for satishyadav: 
analyzing mytest/tut/used/batch-123/test-clean...
writing to mytest/recip/test-clean.lst...
0it [00:00, ?it/s]
analyzing mytest/tut/used/batch-123/test-other...
writing to mytest/recip/test-other.lst...
0it [00:00, ?it/s]
analyzing mytest/tut/used/batch-123/train-clean-100...
writing to mytest/recip/train-clean-100.lst...
100%|█████████████████████████████████████████████| 1/1 [00:00<00:00, 24.31it/s]
analyzing mytest/tut/used/batch-123/train-clean-360...
writing to mytest/recip/train-clean-360.lst...
0it [00:00, ?it/s]
analyzing mytest/tut/used/batch-123/train-other-500...
writing to mytest/recip/train-other-500.lst...
0it [00:00, ?it/s]
analyzing mytest/tut/used/batch-123/dev-clean...
writing to mytest/recip/dev-clean.lst...
0it [00:00, ?it/s]
analyzing mytest/tut/used/batch-123/dev-other...
writing to mytest/recip/dev-other.lst...
0it [00:00, ?it/s]
creating tokens list...
creating word -> tokens lexicon...
Done !

Please note that I have kept all the folders empty except train-clean 100, which contains a single wav file, and corresponding id, tkn and wrd files.

SY-nc commented 5 years ago

Seems that you provided not correct lexicon file. How does it look like?

Is it required for the data preparation process too?

tlikhomanenko commented 5 years ago

@SYnchronYSe,

Sorry I couldn't figure it out. It seemed random to me at the first glance.

It is duration in milliseconds, so you can try to check this for one audio.

Several additional questions for you?

SY-nc commented 5 years ago

It is duration in milliseconds, so you can try to check this for one audio.

You're right. I just verified.

Do you use the one which is generated by data preparation script

Sorry I made things a little confusing by describing two different problems in one thread. The training flagsfile that you're seeing above was for another problem where the model was not detecting my audio files. I solved that one by commenting out the --listdata=true flag.

The problem that remains now is after the data preparation.

This is my new train flagsfile:

--runname=librispeech_conv_glu
--rundir=/home/satishyadav/speech/mytest/recip/
--datadir=/home/satishyadav/speech/mytest/recip/
--tokensdir=/home/satishyadav/speech/mytest/recip/
--archdir=/home/satishyadav/speech/wav2letter/recipes/librispeech/configs/conv_glu/
#--listdata=true
--train=train-clean-100.lst
--valid=dev-clean.lst
--lexicon=/home/satishyadav/speech/mytest/recip/librispeech-train+dev-tokens.dict
--input=wav
--arch=network-big.arch
--tokens=tokens.txt
--criterion=asg
--lr=0.1
--lrcrit=0.001
--linseg=1
#--momentum=0.8
--maxgradnorm=1.0
--replabel=2
--surround=|
--onorm=target
--sqnorm=true
--mfsc=true
--nthread=6
--batchsize=1
--transdiag=4

This is the training log:

I0704 13:18:20.385517 15610 Train.cpp:57] Reading flags from file wav2letter/recipes/librispeech/configs/conv_glu/train.cfg
I0704 13:18:20.403326 15610 Train.cpp:136] Gflags after parsing 
--flagfile=; --fromenv=; --tryfromenv=; --undefok=; --tab_completion_columns=80; --tab_completion_word=; --help=false; --helpfull=false; --helpmatch=; --helpon=; --helppackage=false; --helpshort=false; --helpxml=false; --version=false; --adambeta1=0.90000000000000002; --adambeta2=0.999; --am=; --arch=network-big.arch; --archdir=/home/satishyadav/speech/wav2letter/recipes/librispeech/configs/conv_glu/; --attention=content; --attnWindow=no; --attnconvchannel=0; --attnconvkernel=0; --attndim=0; --batchsize=1; --beamsize=2500; --beamthreshold=25; --channels=1; --criterion=asg; --critoptim=sgd; --datadir=/home/satishyadav/speech/mytest/recip/; --dataorder=input; --decodertype=wrd; --devwin=0; --emission_dir=; --enable_distributed=false; --encoderdim=0; --eostoken=false; --everstoredb=false; --fftcachesize=1; --filterbanks=40; --flagsfile=wav2letter/recipes/librispeech/configs/conv_glu/train.cfg; --gamma=1; --garbage=false; --input=wav; --inputbinsize=100; --inputfeeding=false; --iter=1000000; --itersave=false; --labelsmooth=0; --leftWindowSize=50; --lexicon=/home/satishyadav/speech/mytest/recip/librispeech-train+dev-tokens.dict; --linlr=-1; --linlrcrit=-1; --linseg=1; --listdata=false; --lm=; --lm_memory=5000; --lm_vocab=; --lmtype=kenlm; --lmweight=1; --localnrmlleftctx=0; --localnrmlrightctx=0; --logadd=false; --lr=0.10000000000000001; --lrcrit=0.001; --maxdecoderoutputlen=200; --maxgradnorm=1; --maxisz=9223372036854775807; --maxload=-1; --maxrate=10; --maxsil=50; --maxtsz=9223372036854775807; --maxword=-1; --melfloor=1; --memstepsize=10485760; --mfcc=false; --mfcccoeffs=13; --mfsc=true; --minisz=0; --minrate=3; --minsil=0; --mintsz=0; --momentum=0; --netoptim=sgd; --noresample=false; --nthread=6; --nthread_decoder=1; --onorm=target; --optimepsilon=1e-08; --optimrho=0.90000000000000002; --outputbinsize=5; --pctteacherforcing=100; --pcttraineval=100; --pow=false; --pretrainWindow=0; --replabel=2; --reportiters=0; --rightWindowSize=50; --rndv_filepath=; --rundir=/home/satishyadav/speech/mytest/recip/; --runname=librispeech_conv_glu; --samplerate=16000; --sampletarget=0; --samplingstrategy=rand; --sclite=; --seed=0; --show=false; --showletters=false; --silweight=0; --smearing=none; --softwoffset=10; --softwrate=5; --softwstd=5; --sqnorm=true; --stepsize=1000000; --surround=|; --tag=; --target=tkn; --test=; --tokens=tokens.txt; --tokensdir=/home/satishyadav/speech/mytest/recip/; --train=train-clean-100.lst; --trainWithWindow=false; --transdiag=4; --unkweight=-inf; --usewordpiece=false; --valid=dev-clean.lst; --weightdecay=0; --wordscore=1; --wordseparator=|; --world_rank=0; --world_size=1; --alsologtoemail=; --alsologtostderr=false; --colorlogtostderr=false; --drop_log_memory=true; --log_backtrace_at=; --log_dir=; --log_link=; --log_prefix=true; --logbuflevel=0; --logbufsecs=30; --logemaillevel=999; --logfile_mode=436; --logmailer=/bin/mail; --logtostderr=true; --max_log_size=1800; --minloglevel=0; --stderrthreshold=2; --stop_logging_if_full_disk=false; --symbolize_stacktrace=true; --v=0; --vmodule=; 
I0704 13:18:20.403352 15610 Train.cpp:137] Experiment path: /home/satishyadav/speech/mytest/recip/librispeech_conv_glu
I0704 13:18:20.403355 15610 Train.cpp:138] Experiment runidx: 1
I0704 13:18:20.416241 15610 Train.cpp:166] Number of classes (network): 30
terminate called after throwing an instance of 'std::runtime_error'
  what():  [Loading words] Invalid line:  
*** Aborted at 1562226500 (unix time) try "date -d @1562226500" if you are using GNU date ***
PC: @     0x7f84aeb08428 gsignal
*** SIGABRT (@0x3cfa) received by PID 15610 (TID 0x7f84d806d5c0) from PID 15610; stack trace: ***
    @     0x7f84b68c5390 (unknown)
    @     0x7f84aeb08428 gsignal
    @     0x7f84aeb0a02a abort
    @     0x7f84af66d84d __gnu_cxx::__verbose_terminate_handler()
    @     0x7f84af66b6b6 (unknown)
    @     0x7f84af66b701 std::terminate()
    @     0x7f84af66b919 __cxa_throw
    @           0x567ecf w2l::loadWords()
    @           0x4197d3 main
    @     0x7f84aeaf3830 __libc_start_main
    @           0x4792a9 _start
Aborted (core dumped)
  • Could you show head of your lexicon file

You're damn right. For some reason, the prepare data script is creating a lexicon with 1st line blank:


a a
brown b r o w n
dog d o g
fox f o x
jumps j u m p s
lazy l a z y
over o v e r
quick q u i c k
the t h e

I just manually removed the first blank line, and I bypassed that error.

Now my data directory looks like this:

Screenshot from 2019-07-04 13-28-29

But now I'm getting a 'directory doesn't exist' error as shown:

~/speech$ sudo wav2letter/build/Train train --flagsfile wav2letter/recipes/librispeech/configs/conv_glu/train.cfg --logtostderr=1
I0704 13:24:49.738492 15847 Train.cpp:57] Reading flags from file wav2letter/recipes/librispeech/configs/conv_glu/train.cfg
I0704 13:24:49.750335 15847 Train.cpp:136] Gflags after parsing 
--flagfile=; --fromenv=; --tryfromenv=; --undefok=; --tab_completion_columns=80; --tab_completion_word=; --help=false; --helpfull=false; --helpmatch=; --helpon=; --helppackage=false; --helpshort=false; --helpxml=false; --version=false; --adambeta1=0.90000000000000002; --adambeta2=0.999; --am=; --arch=network-big.arch; --archdir=/home/satishyadav/speech/wav2letter/recipes/librispeech/configs/conv_glu/; --attention=content; --attnWindow=no; --attnconvchannel=0; --attnconvkernel=0; --attndim=0; --batchsize=1; --beamsize=2500; --beamthreshold=25; --channels=1; --criterion=asg; --critoptim=sgd; --datadir=/home/satishyadav/speech/mytest/recip/; --dataorder=input; --decodertype=wrd; --devwin=0; --emission_dir=; --enable_distributed=false; --encoderdim=0; --eostoken=false; --everstoredb=false; --fftcachesize=1; --filterbanks=40; --flagsfile=wav2letter/recipes/librispeech/configs/conv_glu/train.cfg; --gamma=1; --garbage=false; --input=wav; --inputbinsize=100; --inputfeeding=false; --iter=1000000; --itersave=false; --labelsmooth=0; --leftWindowSize=50; --lexicon=/home/satishyadav/speech/mytest/recip/librispeech-train+dev-tokens.dict; --linlr=-1; --linlrcrit=-1; --linseg=1; --listdata=false; --lm=; --lm_memory=5000; --lm_vocab=; --lmtype=kenlm; --lmweight=1; --localnrmlleftctx=0; --localnrmlrightctx=0; --logadd=false; --lr=0.10000000000000001; --lrcrit=0.001; --maxdecoderoutputlen=200; --maxgradnorm=1; --maxisz=9223372036854775807; --maxload=-1; --maxrate=10; --maxsil=50; --maxtsz=9223372036854775807; --maxword=-1; --melfloor=1; --memstepsize=10485760; --mfcc=false; --mfcccoeffs=13; --mfsc=true; --minisz=0; --minrate=3; --minsil=0; --mintsz=0; --momentum=0; --netoptim=sgd; --noresample=false; --nthread=6; --nthread_decoder=1; --onorm=target; --optimepsilon=1e-08; --optimrho=0.90000000000000002; --outputbinsize=5; --pctteacherforcing=100; --pcttraineval=100; --pow=false; --pretrainWindow=0; --replabel=2; --reportiters=0; --rightWindowSize=50; --rndv_filepath=; --rundir=/home/satishyadav/speech/mytest/recip/; --runname=librispeech_conv_glu; --samplerate=16000; --sampletarget=0; --samplingstrategy=rand; --sclite=; --seed=0; --show=false; --showletters=false; --silweight=0; --smearing=none; --softwoffset=10; --softwrate=5; --softwstd=5; --sqnorm=true; --stepsize=1000000; --surround=|; --tag=; --target=tkn; --test=; --tokens=tokens.txt; --tokensdir=/home/satishyadav/speech/mytest/recip/; --train=train-clean-100.lst; --trainWithWindow=false; --transdiag=4; --unkweight=-inf; --usewordpiece=false; --valid=dev-clean.lst; --weightdecay=0; --wordscore=1; --wordseparator=|; --world_rank=0; --world_size=1; --alsologtoemail=; --alsologtostderr=false; --colorlogtostderr=false; --drop_log_memory=true; --log_backtrace_at=; --log_dir=; --log_link=; --log_prefix=true; --logbuflevel=0; --logbufsecs=30; --logemaillevel=999; --logfile_mode=436; --logmailer=/bin/mail; --logtostderr=true; --max_log_size=1800; --minloglevel=0; --stderrthreshold=2; --stop_logging_if_full_disk=false; --symbolize_stacktrace=true; --v=0; --vmodule=; 
I0704 13:24:49.750355 15847 Train.cpp:137] Experiment path: /home/satishyadav/speech/mytest/recip/librispeech_conv_glu
I0704 13:24:49.750360 15847 Train.cpp:138] Experiment runidx: 1
I0704 13:24:49.750713 15847 Train.cpp:166] Number of classes (network): 30
I0704 13:24:49.750739 15847 Train.cpp:173] Number of words: 10
I0704 13:24:49.750754 15847 Train.cpp:187] Loading architecture file from /home/satishyadav/speech/wav2letter/recipes/librispeech/configs/conv_glu/network-big.arch
I0704 13:24:50.139755 15847 Train.cpp:208] [Network] Sequential [input -> (0) -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> (12) -> (13) -> (14) -> (15) -> (16) -> (17) -> (18) -> (19) -> (20) -> (21) -> (22) -> (23) -> (24) -> (25) -> (26) -> (27) -> (28) -> (29) -> output]
    (0): View (-1 1 40 0)
    (1): WeightNorm (Conv2D (40->400, 13x1, 1,1, 170,0, 1, 1) (with bias), 3)
    (2): GatedLinearUnit (2)
    (3): Dropout (0.200000)
    (4): WeightNorm (Conv2D (200->440, 14x1, 1,1, 0,0, 1, 1) (with bias), 3)
    (5): GatedLinearUnit (2)
    (6): Dropout (0.214000)
    (7): WeightNorm (Conv2D (220->484, 15x1, 1,1, 0,0, 1, 1) (with bias), 3)
    (8): GatedLinearUnit (2)
    (9): Dropout (0.228980)
    (10): WeightNorm (Conv2D (242->532, 16x1, 1,1, 0,0, 1, 1) (with bias), 3)
    (11): GatedLinearUnit (2)
    (12): Dropout (0.245009)
    (13): WeightNorm (Conv2D (266->584, 17x1, 1,1, 0,0, 1, 1) (with bias), 3)
    (14): GatedLinearUnit (2)
    (15): Dropout (0.262159)
    (16): WeightNorm (Conv2D (292->642, 18x1, 1,1, 0,0, 1, 1) (with bias), 3)
    (17): GatedLinearUnit (2)
    (18): Dropout (0.280510)
    (19): WeightNorm (Conv2D (321->706, 19x1, 1,1, 0,0, 1, 1) (with bias), 3)
    (20): GatedLinearUnit (2)
    (21): Dropout (0.300146)
    (22): WeightNorm (Conv2D (353->776, 20x1, 1,1, 0,0, 1, 1) (with bias), 3)
    (23): GatedLinearUnit (2)
    (24): Dropout (0.321156)
    (25): Reorder (2,0,3,1)
    (26): WeightNorm (Linear (388->776) (with bias), 0)
    (27): GatedLinearUnit (0)
    (28): Dropout (0.321156)
    (29): WeightNorm (Linear (388->30) (with bias), 0)
I0704 13:24:50.139801 15847 Train.cpp:209] [Network Params: 21220226]
I0704 13:24:50.139807 15847 Train.cpp:210] [Criterion] AutoSegmentationCriterion
I0704 13:24:50.139816 15847 Train.cpp:218] [Network Optimizer] SGD
I0704 13:24:50.139819 15847 Train.cpp:219] [Criterion Optimizer] SGD
I0704 13:24:50.139838 15847 Train.cpp:233] [Criterion] LinearSegmentationCriterion (for first 1 epochs)
I0704 13:24:50.139847 15847 Train.cpp:246] [Network Optimizer] SGD (for first 1 epochs)
I0704 13:24:50.139852 15847 Train.cpp:249] [Criterion Optimizer] SGD (for first 1 epochs)
I0704 13:24:50.140286 15847 NumberedFilesLoader.cpp:29] Adding dataset /home/satishyadav/speech/mytest/recip/train-clean-100.lst ...
F0704 13:24:50.140336 15847 NumberedFilesLoader.cpp:32] Directory '/home/satishyadav/speech/mytest/recip/train-clean-100.lst' doesn't exist
*** Check failure stack trace: ***
    @     0x7f9f0eca615d  google::LogMessage::Fail()
    @     0x7f9f0eca8713  google::LogMessage::SendToLog()
    @     0x7f9f0eca5ceb  google::LogMessage::Flush()
    @     0x7f9f0eca765e  google::LogMessageFatal::~LogMessageFatal()
    @           0x5c541f  w2l::NumberedFilesLoader::NumberedFilesLoader()
    @           0x5c3c2d  std::vector<>::_M_emplace_back_aux<>()
    @           0x5c2e0b  w2l::W2lNumberedFilesDataset::W2lNumberedFilesDataset()
    @           0x5acc01  w2l::createDataset()
    @           0x41ac85  main
    @     0x7f9f0de2d830  __libc_start_main
    @           0x4792a9  _start
Aborted (core dumped)
SY-nc commented 5 years ago

Please note that the prepare_data.py recipe didn't create any files, except the ones in the screenshot.

And apparently, it's expecting train-clean-100.lst to be a directory where the 4 files- wav,id,tkn,wrd should be there.

I matched my file path formats with the ones given in train.cfg in actual repo

tlikhomanenko commented 5 years ago

@SYnchronYSe

And apparently, it's expecting train-clean-100.lst to be a directory where the 4 files- wav,id,tkn,wrd should be there.

There two options how to provide dataset: one is specify the path to the folder where all files will be stored, or provide list dataset file where id, path to audio file, its size in milliseconds and original transcription are provided. So train-clean-100.lst is not a folder, this is list and you need to uncomment this row #--listdata=true.

Right did I understand that if you are using --listdata=true then you will get the error with the following log?

I0704 13:18:20.385517 15610 Train.cpp:57] Reading flags from file wav2letter/recipes/librispeech/configs/conv_glu/train.cfg
I0704 13:18:20.403326 15610 Train.cpp:136] Gflags after parsing 
--flagfile=; --fromenv=; --tryfromenv=; --undefok=; --tab_completion_columns=80; --tab_completion_word=; --help=false; --helpfull=false; --helpmatch=; --helpon=; --helppackage=false; --helpshort=false; --helpxml=false; --version=false; --adambeta1=0.90000000000000002; --adambeta2=0.999; --am=; --arch=network-big.arch; --archdir=/home/satishyadav/speech/wav2letter/recipes/librispeech/configs/conv_glu/; --attention=content; --attnWindow=no; --attnconvchannel=0; --attnconvkernel=0; --attndim=0; --batchsize=1; --beamsize=2500; --beamthreshold=25; --channels=1; --criterion=asg; --critoptim=sgd; --datadir=/home/satishyadav/speech/mytest/recip/; --dataorder=input; --decodertype=wrd; --devwin=0; --emission_dir=; --enable_distributed=false; --encoderdim=0; --eostoken=false; --everstoredb=false; --fftcachesize=1; --filterbanks=40; --flagsfile=wav2letter/recipes/librispeech/configs/conv_glu/train.cfg; --gamma=1; --garbage=false; --input=wav; --inputbinsize=100; --inputfeeding=false; --iter=1000000; --itersave=false; --labelsmooth=0; --leftWindowSize=50; --lexicon=/home/satishyadav/speech/mytest/recip/librispeech-train+dev-tokens.dict; --linlr=-1; --linlrcrit=-1; --linseg=1; --listdata=false; --lm=; --lm_memory=5000; --lm_vocab=; --lmtype=kenlm; --lmweight=1; --localnrmlleftctx=0; --localnrmlrightctx=0; --logadd=false; --lr=0.10000000000000001; --lrcrit=0.001; --maxdecoderoutputlen=200; --maxgradnorm=1; --maxisz=9223372036854775807; --maxload=-1; --maxrate=10; --maxsil=50; --maxtsz=9223372036854775807; --maxword=-1; --melfloor=1; --memstepsize=10485760; --mfcc=false; --mfcccoeffs=13; --mfsc=true; --minisz=0; --minrate=3; --minsil=0; --mintsz=0; --momentum=0; --netoptim=sgd; --noresample=false; --nthread=6; --nthread_decoder=1; --onorm=target; --optimepsilon=1e-08; --optimrho=0.90000000000000002; --outputbinsize=5; --pctteacherforcing=100; --pcttraineval=100; --pow=false; --pretrainWindow=0; --replabel=2; --reportiters=0; --rightWindowSize=50; --rndv_filepath=; --rundir=/home/satishyadav/speech/mytest/recip/; --runname=librispeech_conv_glu; --samplerate=16000; --sampletarget=0; --samplingstrategy=rand; --sclite=; --seed=0; --show=false; --showletters=false; --silweight=0; --smearing=none; --softwoffset=10; --softwrate=5; --softwstd=5; --sqnorm=true; --stepsize=1000000; --surround=|; --tag=; --target=tkn; --test=; --tokens=tokens.txt; --tokensdir=/home/satishyadav/speech/mytest/recip/; --train=train-clean-100.lst; --trainWithWindow=false; --transdiag=4; --unkweight=-inf; --usewordpiece=false; --valid=dev-clean.lst; --weightdecay=0; --wordscore=1; --wordseparator=|; --world_rank=0; --world_size=1; --alsologtoemail=; --alsologtostderr=false; --colorlogtostderr=false; --drop_log_memory=true; --log_backtrace_at=; --log_dir=; --log_link=; --log_prefix=true; --logbuflevel=0; --logbufsecs=30; --logemaillevel=999; --logfile_mode=436; --logmailer=/bin/mail; --logtostderr=true; --max_log_size=1800; --minloglevel=0; --stderrthreshold=2; --stop_logging_if_full_disk=false; --symbolize_stacktrace=true; --v=0; --vmodule=; 
I0704 13:18:20.403352 15610 Train.cpp:137] Experiment path: /home/satishyadav/speech/mytest/recip/librispeech_conv_glu
I0704 13:18:20.403355 15610 Train.cpp:138] Experiment runidx: 1
I0704 13:18:20.416241 15610 Train.cpp:166] Number of classes (network): 30
terminate called after throwing an instance of 'std::runtime_error'
  what():  [Loading words] Invalid line:  
*** Aborted at 1562226500 (unix time) try "date -d @1562226500" if you are using GNU date ***
PC: @     0x7f84aeb08428 gsignal
*** SIGABRT (@0x3cfa) received by PID 15610 (TID 0x7f84d806d5c0) from PID 15610; stack trace: ***
    @     0x7f84b68c5390 (unknown)
    @     0x7f84aeb08428 gsignal
    @     0x7f84aeb0a02a abort
    @     0x7f84af66d84d __gnu_cxx::__verbose_terminate_handler()
    @     0x7f84af66b6b6 (unknown)
    @     0x7f84af66b701 std::terminate()
    @     0x7f84af66b919 __cxa_throw
    @           0x567ecf w2l::loadWords()
    @           0x4197d3 main
    @     0x7f84aeaf3830 __libc_start_main
    @           0x4792a9 _start
Aborted (core dumped)
tlikhomanenko commented 5 years ago

Please comment --input=wav in your config, because librispeech contains files in flac format

SY-nc commented 5 years ago

Please comment --input=wav in your config, because librispeech contains files in flac format

The files that I'll be dealing with are of wav format. I had changed this flag from flac.

There two options how to provide dataset: one is specify the path to the folder where all files will be stored, or provide list dataset file where id, path to audio file, its size in milliseconds and original transcription are provided. So train-clean-100.lst is not a folder, this is list and you need to uncomment this row #--listdata=true.

Thank you so much for the clarification. I'll try this out and update you. Closing this for the time being.

tlikhomanenko commented 5 years ago

@SYnchronYSe,

You're damn right. For some reason, the prepare data script is creating a lexicon with 1st line blank:

I rerun data preparation on my local machine and lexicon is generated correctly (the first row is not empty).

SY-nc commented 5 years ago

you need to uncomment this row #--listdata=true.

Thanks. It works.

I rerun data preparation on my local machine and lexicon is generated correctly

My transcripts file had tabs between each filename and the corresponding transcripts. As soon as I replaced the tabs with single spaces, my lexicon file gets generated correctly. I guess the space was being treated as a separate word too.

Thanks for helping me out patiently.

adamchant commented 4 years ago

Hey, I'm facing a similar issue, with data load... I0410 19:25:33.644801 10672 Train.cpp:250] [Network Params: 93568814] I0410 19:25:33.644846 10672 Train.cpp:251] [Criterion] ConnectionistTemporalClassificationCriterion I0410 19:25:33.644908 10672 Train.cpp:259] [Network Optimizer] SGD I0410 19:25:33.644919 10672 Train.cpp:260] [Criterion Optimizer] SGD I0410 19:25:47.581954 10672 W2lListFilesDataset.cpp:141] 527188 files found. I0410 19:25:47.605779 10672 Utils.cpp:102] Filtered 527188/527188 samples I0410 19:25:47.605832 10672 W2lListFilesDataset.cpp:62] Total batches (i.e. iters): 0 I0410 19:25:50.677551 10672 W2lListFilesDataset.cpp:141] 116208 files found. I0410 19:25:50.682888 10672 Utils.cpp:102] Filtered 116208/116208 samples I0410 19:25:50.682924 10672 W2lListFilesDataset.cpp:62] Total batches (i.e. iters): 0 Files are being found... I0410 19:25:47.581954 10672 W2lListFilesDataset.cpp:141] 527188 files found. But the number of batches being generated is 0, I0410 19:25:47.605832 10672 W2lListFilesDataset.cpp:62] Total batches (i.e. iters): 0 Because of which, my train logs look like : I0410 19:31:45.785372 10890 Train.cpp:564] Epoch 1 started! I0410 19:31:46.039963 10890 Train.cpp:342] epoch: 1 | nupdates: 0 | lr: 0.400000 | lrcriterion: 0.000000 | runtime: 00:00:00 | bch(ms): 0.00 | smp(ms): 0.00 | fwd(ms): 0.00 | crit-fwd(ms): 0.00 | bwd(ms): 0.00 | optim(ms): 0.00 | loss: 0.00000 | train-TER: 0.00 | train-WER: 0.00 | dev-clean-loss: 0.00000 | dev-clean-TER: 0.00 | dev-clean-WER: 0.00 | avg-isz: 000 | avg-tsz: 000 | max-tsz: 000 | hrs: 0.00 | thrpt(sec/sec): n/a I am running the streaming_convnets recipe Can someone help me understand this issue Thank you

SY-nc commented 4 years ago

@adamchant could you please show your flagsfile contents (train.cfg)?

lagidigu commented 4 years ago

@adamchant @SYnchronYSe

I ran into a similar problem, where all of the samples got filtered, resulting in 0 Total batches (i.e. iters).

The issue was that the durations in the .lst files were in seconds. Converting them to milliseconds fixed the issue. Hope that saves you time.