flashlight / wav2letter

Facebook AI Research's Automatic Speech Recognition Toolkit
https://github.com/facebookresearch/wav2letter/wiki
Other
6.37k stars 1.01k forks source link

training error- "what(): Error: compute_ctc_loss, stat = cuda memcpy or memset failed" , "Error: compute_ctc_loss, stat = unknown error", "CUDNN_STATUS_BAD_PARAM" #276

Closed megharangaswamy closed 5 years ago

megharangaswamy commented 5 years ago

Dear all, Kindly help me with this.

I am trying to train my model. I am getting stuck with a few errors. Can u please elaborate how you solved this problem. Below is my log on console. I get different errors in a random fashion. There are few suggestions on them how others encountered such problems, but nowhere the exact solution is mentioned on how to solve them. some times I get 1) what(): Error: compute_ctc_loss, stat = cuda memcpy or memset failed (detailed logs as below) 2) Error: compute_ctc_loss, stat = unknown error (detailed logs as below)

/media/home/megha/5_wav2letter/WAV_2_LETTER/wav2letter/build/Train train --flagsfile /media/home/megha/5_wav2letter/WAV_2_LETTER/wav2letter/tutorials/1-librispeech_clean/train.cfg I0424 16:46:03.113786 14218 Train.cpp:139] Gflags after parsing --flagfile=; --fromenv=; --tryfromenv=; --undefok=; --tab_completion_columns=80; --tab_completion_word=; --help=false; --helpfull=false; --helpmatch=; --helpon=; --helppackage=false; --helpshort=false; --helpxml=false; --version=false; --am=; --arch=network.arch; --archdir=/media/home/megha/5_wav2letter/WAV_2_LETTER/wav2letter/tutorials/1-librispeech_clean/; --attention=content; --attnWindow=no; --batchsize=4; --beamscore=25; --beamsize=2500; --channels=1; --criterion=ctc; --datadir=/media/home/megha/5_wav2letter/WAV_2_LETTER/; --dataorder=input; --devwin=0; --emission_dir=; --enable_distributed=false; --encoderdim=0; --eostoken=false; --everstoredb=false; --fftcachesize=1; --filterbanks=40; --flagsfile=/media/home/megha/5_wav2letter/WAV_2_LETTER/wav2letter/tutorials/1-librispeech_clean/train.cfg; --forceendsil=false; --gamma=0.20000000000000001; --garbage=false; --input=flac; --inputbinsize=100; --inputfeeding=false; --iter=100; --itersave=false; --labelsmooth=0; --leftWindowSize=50; --lexicon=; --linlr=-1; --linlrcrit=-1; --linseg=0; --lm=; --lmtype=kenlm; --lmweight=1; --localnrmlleftctx=0; --localnrmlrightctx=0; --logadd=false; --lr=0.050000000000000003; --lrcrit=0.0060000000000000001; --maxdecoderoutputlen=200; --maxgradnorm=1; --maxisz=9223372036854775807; --maxload=-1; --maxrate=10; --maxsil=50; --maxtsz=9223372036854775807; --maxword=-1; --melfloor=1; --memstepsize=10485760; --mfcc=false; --mfcccoeffs=13; --mfsc=true; --minisz=0; --minrate=3; --minsil=0; --mintsz=0; --momentum=0.80000000000000004; --noresample=false; --nthread=4; --nthread_decoder=1; --onorm=target; --outputbinsize=5; --pctteacherforcing=100; --pcttraineval=100; --pow=false; --replabel=2; --reportiters=0; --rightWindowSize=50; --rndv_filepath=; --rundir=/media/home/megha/5_wav2letter/WAV_2_LETTER/; --runname=deutsche_Combined_clean_trainlogs; --samplerate=16000; --samplingstrategy=rand; --sclite=; --seed=0; --show=false; --showletters=false; --silweight=0; --skipoov=false; --smearing=none; --softwoffset=10; --softwrate=5; --softwstd=5; --sqnorm=true; --stepsize=1; --surround=|; --tag=; --target=tkn; --targettype=video; --test=; --tokens=wav2letter/tutorials/output/data/tokens.txt; --tokensdir=/media/home/megha/5_wav2letter/WAV_2_LETTER/; --train=wav2letter/tutorials/output/data/train-clean-100; --trainWithWindow=false; --transdiag=0; --unkweight=-inf; --valid=wav2letter/tutorials/output/data/dev-clean; --weightdecay=0; --wordscore=1; --world_rank=0; --world_size=1; --alsologtoemail=; --alsologtostderr=false; --colorlogtostderr=false; --drop_log_memory=true; --log_backtrace_at=; --log_dir=; --log_link=; --log_prefix=true; --logbuflevel=0; --logbufsecs=30; --logemaillevel=999; --logfile_mode=436; --logmailer=/bin/mail; --logtostderr=true; --max_log_size=1800; --minloglevel=0; --stderrthreshold=2; --stop_logging_if_full_disk=false; --symbolize_stacktrace=true; --v=0; --vmodule=; I0424 16:46:03.113804 14218 Train.cpp:141] Experiment path: /media/home/megha/5_wav2letter/WAV_2_LETTER/deutsche_Combined_clean_trainlogs I0424 16:46:03.113821 14218 Train.cpp:142] Experiment runidx: 1 I0424 16:46:03.114193 14218 Train.cpp:160] Number of classes (network) = 37 I0424 16:46:03.114209 14218 Train.cpp:171] Loading architecture file from /media/home/megha/5_wav2letter/WAV_2_LETTER/wav2letter/tutorials/1-librispeech_clean/network.arch I0424 16:46:03.798933 14218 Train.cpp:191] [Network] Sequential [input -> (0) -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> (12) -> (13) -> (14) -> (15) -> (16) -> (17) -> (18) -> (19) -> (20) -> output] (0): View (-1 1 40 0) (1): Conv2D (40->256, 8x1, 2,1, SAME,SAME) (with bias) (2): ReLU (3): Conv2D (256->256, 8x1, 1,1, SAME,SAME) (with bias) (4): ReLU (5): Conv2D (256->256, 8x1, 1,1, SAME,SAME) (with bias) (6): ReLU (7): Conv2D (256->256, 8x1, 1,1, SAME,SAME) (with bias) (8): ReLU (9): Conv2D (256->256, 8x1, 1,1, SAME,SAME) (with bias) (10): ReLU (11): Conv2D (256->256, 8x1, 1,1, SAME,SAME) (with bias) (12): ReLU (13): Conv2D (256->256, 8x1, 1,1, SAME,SAME) (with bias) (14): ReLU (15): Conv2D (256->256, 8x1, 1,1, SAME,SAME) (with bias) (16): ReLU (17): Reorder (2,0,3,1) (18): Linear (256->512) (with bias) (19): ReLU (20): Linear (512->37) (with bias) I0424 16:46:03.798955 14218 Train.cpp:192] [Network Params: 3904549] I0424 16:46:03.798974 14218 Train.cpp:193] [Criterion] ConnectionistTemporalClassificationCriterion I0424 16:46:03.799340 14218 NumberedFilesLoader.cpp:29] Adding dataset /media/home/megha/5_wav2letter/WAV_2_LETTER/wav2letter/tutorials/output/data/train-clean-100 ... I0424 16:46:03.799427 14218 NumberedFilesLoader.cpp:68] 2731 files found. I0424 16:46:03.823515 14218 Utils.cpp:102] Filtered 0/2731 samples I0424 16:46:03.823717 14218 W2lNumberedFilesDataset.cpp:57] Total batches (i.e. iters): 683 I0424 16:46:03.823861 14218 NumberedFilesLoader.cpp:29] Adding dataset /media/home/megha/5_wav2letter/WAV_2_LETTER/wav2letter/tutorials/output/data/dev-clean ... I0424 16:46:03.823957 14218 NumberedFilesLoader.cpp:68] 960 files found. I0424 16:46:03.831384 14218 Utils.cpp:102] Filtered 0/960 samples I0424 16:46:03.831449 14218 W2lNumberedFilesDataset.cpp:57] Total batches (i.e. iters): 240 I0424 16:46:03.928385 14218 Train.cpp:441] Shuffling trainset I0424 16:46:03.928495 14218 Train.cpp:448] Epoch 1 started! terminate called after throwing an instance of 'std::runtime_error' what(): Error: compute_ctc_loss, stat = unknown error Aborted at 1556117170 (unix time) try "date -d @1556117170" if you are using GNU date PC: @ 0x7f3a65af2428 gsignal SIGABRT (@0x3e80000378a) received by PID 14218 (TID 0x7f3ae544c800) from PID 14218; stack trace: @ 0x7f3a65af24b0 (unknown) @ 0x7f3a65af2428 gsignal @ 0x7f3a65af402a abort @ 0x7f3a6665784d __gnu_cxx::verbose_terminate_handler() @ 0x7f3a666556b6 (unknown) @ 0x7f3a66655701 std::terminate() @ 0x7f3a66655919 cxa_throw @ 0x525f5b w2l::(anonymous namespace)::throw_on_error() @ 0x526d01 w2l::ConnectionistTemporalClassificationCriterion::forward() @ 0x45fa3c _ZZ4mainENKUlSt10shared_ptrIN2fl6ModuleEES_IN3w2l17SequenceCriterionEES_INS3_10W2lDatasetEERNS0_19FirstOrderOptimizerES9_biE4_clES2_S5_S7_S9_S9_bi.constprop.8756 @ 0x418f72 main @ 0x7f3a65add830 __libc_start_main @ 0x45bb19 _start Aborted

$ /media/home/megha/5_wav2letter/WAV_2_LETTER/wav2letter/build/Train train --flagsfile /media/home/megha/5_wav2letter/WAV_2_LETTER/wav2letter/tutorials/1-librispeech_clean/train.cfg I0425 10:41:39.858770 10847 Train.cpp:139] Gflags after parsing --flagfile=; --fromenv=; --tryfromenv=; --undefok=; --tab_completion_columns=80; --tab_completion_word=; --help=false; --helpfull=false; --helpmatch=; --helpon=; --helppackage=false; --helpshort=false; --helpxml=false; --version=false; --am=; --arch=network.arch; --archdir=/media/home/megha/5_wav2letter/WAV_2_LETTER/wav2letter/tutorials/1-librispeech_clean/; --attention=content; --attnWindow=no; --batchsize=2; --beamscore=25; --beamsize=2500; --channels=1; --criterion=ctc; --datadir=/media/home/megha/5_wav2letter/WAV_2_LETTER/; --dataorder=input; --devwin=0; --emission_dir=; --enable_distributed=false; --encoderdim=0; --eostoken=false; --everstoredb=false; --fftcachesize=1; --filterbanks=40; --flagsfile=/media/home/megha/5_wav2letter/WAV_2_LETTER/wav2letter/tutorials/1-librispeech_clean/train.cfg; --forceendsil=false; --gamma=0.20000000000000001; --garbage=false; --input=flac; --inputbinsize=100; --inputfeeding=false; --iter=100; --itersave=false; --labelsmooth=0; --leftWindowSize=50; --lexicon=; --linlr=-1; --linlrcrit=-1; --linseg=0; --lm=; --lmtype=kenlm; --lmweight=1; --localnrmlleftctx=0; --localnrmlrightctx=0; --logadd=false; --lr=0.050000000000000003; --lrcrit=0.0060000000000000001; --maxdecoderoutputlen=200; --maxgradnorm=1; --maxisz=9223372036854775807; --maxload=-1; --maxrate=10; --maxsil=50; --maxtsz=9223372036854775807; --maxword=-1; --melfloor=1; --memstepsize=10485760; --mfcc=false; --mfcccoeffs=13; --mfsc=true; --minisz=0; --minrate=3; --minsil=0; --mintsz=0; --momentum=0.80000000000000004; --noresample=false; --nthread=2; --nthread_decoder=1; --onorm=target; --outputbinsize=5; --pctteacherforcing=100; --pcttraineval=100; --pow=false; --replabel=2; --reportiters=0; --rightWindowSize=50; --rndv_filepath=; --rundir=/media/home/megha/5_wav2letter/WAV_2_LETTER/; --runname=deutsche_Combined_clean_trainlogs; --samplerate=16000; --samplingstrategy=rand; --sclite=; --seed=0; --show=false; --showletters=false; --silweight=0; --skipoov=false; --smearing=none; --softwoffset=10; --softwrate=5; --softwstd=5; --sqnorm=true; --stepsize=1; --surround=|; --tag=; --target=tkn; --targettype=video; --test=; --tokens=wav2letter/tutorials/output/data/tokens.txt; --tokensdir=/media/home/megha/5_wav2letter/WAV_2_LETTER/; --train=wav2letter/tutorials/output/data/train-clean-100; --trainWithWindow=false; --transdiag=0; --unkweight=-inf; --valid=wav2letter/tutorials/output/data/dev-clean; --weightdecay=0; --wordscore=1; --world_rank=0; --world_size=1; --alsologtoemail=; --alsologtostderr=false; --colorlogtostderr=false; --drop_log_memory=true; --log_backtrace_at=; --log_dir=; --log_link=; --log_prefix=true; --logbuflevel=0; --logbufsecs=30; --logemaillevel=999; --logfile_mode=436; --logmailer=/bin/mail; --logtostderr=true; --max_log_size=1800; --minloglevel=0; --stderrthreshold=2; --stop_logging_if_full_disk=false; --symbolize_stacktrace=true; --v=0; --vmodule=; I0425 10:41:39.858788 10847 Train.cpp:141] Experiment path: /media/home/megha/5_wav2letter/WAV_2_LETTER/deutsche_Combined_clean_trainlogs I0425 10:41:39.858790 10847 Train.cpp:142] Experiment runidx: 1 I0425 10:41:39.859186 10847 Train.cpp:160] Number of classes (network) = 42 I0425 10:41:39.859203 10847 Train.cpp:171] Loading architecture file from /media/home/megha/5_wav2letter/WAV_2_LETTER/wav2letter/tutorials/1-librispeech_clean/network.arch I0425 10:41:40.549096 10847 Train.cpp:191] [Network] Sequential [input -> (0) -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> (12) -> (13) -> (14) -> (15) -> (16) -> (17) -> (18) -> (19) -> (20) -> output] (0): View (-1 1 40 0) (1): Conv2D (40->256, 8x1, 2,1, SAME,SAME) (with bias) (2): ReLU (3): Conv2D (256->256, 8x1, 1,1, SAME,SAME) (with bias) (4): ReLU (5): Conv2D (256->256, 8x1, 1,1, SAME,SAME) (with bias) (6): ReLU (7): Conv2D (256->256, 8x1, 1,1, SAME,SAME) (with bias) (8): ReLU (9): Conv2D (256->256, 8x1, 1,1, SAME,SAME) (with bias) (10): ReLU (11): Conv2D (256->256, 8x1, 1,1, SAME,SAME) (with bias) (12): ReLU (13): Conv2D (256->256, 8x1, 1,1, SAME,SAME) (with bias) (14): ReLU (15): Conv2D (256->256, 8x1, 1,1, SAME,SAME) (with bias) (16): ReLU (17): Reorder (2,0,3,1) (18): Linear (256->512) (with bias) (19): ReLU (20): Linear (512->42) (with bias) I0425 10:41:40.549119 10847 Train.cpp:192] [Network Params: 3907114] I0425 10:41:40.549124 10847 Train.cpp:193] [Criterion] ConnectionistTemporalClassificationCriterion I0425 10:41:40.549458 10847 NumberedFilesLoader.cpp:29] Adding dataset /media/home/megha/5_wav2letter/WAV_2_LETTER/wav2letter/tutorials/output/data/train-clean-100 ... I0425 10:41:40.549569 10847 NumberedFilesLoader.cpp:68] 70471 files found. I0425 10:41:41.107362 10847 Utils.cpp:102] Filtered 0/70471 samples I0425 10:41:41.112356 10847 W2lNumberedFilesDataset.cpp:57] Total batches (i.e. iters): 35236 I0425 10:41:41.112501 10847 NumberedFilesLoader.cpp:29] Adding dataset /media/home/megha/5_wav2letter/WAV_2_LETTER/wav2letter/tutorials/output/data/dev-clean ... I0425 10:41:41.112643 10847 NumberedFilesLoader.cpp:68] 5181 files found. I0425 10:41:41.154306 10847 Utils.cpp:102] Filtered 0/5181 samples I0425 10:41:41.154740 10847 W2lNumberedFilesDataset.cpp:57] Total batches (i.e. iters): 2591 I0425 10:41:41.245987 10847 Train.cpp:441] Shuffling trainset I0425 10:41:41.248873 10847 Train.cpp:448] Epoch 1 started! terminate called after throwing an instance of 'std::runtime_error' what(): Error: compute_ctc_loss, stat = cuda memcpy or memset failed Aborted at 1556181705 (unix time) try "date -d @1556181705" if you are using GNU date PC: @ 0x7fe70f3a5428 gsignal SIGABRT (@0x3e800002a5f) received by PID 10847 (TID 0x7fe78ecff800) from PID 10847; stack trace: @ 0x7fe70f3a54b0 (unknown) @ 0x7fe70f3a5428 gsignal @ 0x7fe70f3a702a abort @ 0x7fe70ff0a84d __gnu_cxx::verbose_terminate_handler() @ 0x7fe70ff086b6 (unknown) @ 0x7fe70ff08701 std::terminate() @ 0x7fe70ff08919 cxa_throw @ 0x525f5b w2l::(anonymous namespace)::throw_on_error() @ 0x526d01 w2l::ConnectionistTemporalClassificationCriterion::forward() @ 0x45fa3c _ZZ4mainENKUlSt10shared_ptrIN2fl6ModuleEES_IN3w2l17SequenceCriterionEES_INS3_10W2lDatasetEERNS0_19FirstOrderOptimizerES9_biE4_clES2_S5_S7_S9_S9_bi.constprop.8756 @ 0x418f72 main @ 0x7fe70f390830 __libc_start_main @ 0x45bb19 _start Aborted


My train config file contains settings as below:

# Training config for Mini Librispeech
# Replace `[...]` with appropriate paths

--datadir=/media/home/megha/5_wav2letter/WAV_2_LETTER/
--tokensdir=/media/home/megha/5_wav2letter/WAV_2_LETTER/
--rundir=/media/home/megha/5_wav2letter/WAV_2_LETTER/
--archdir=/media/home/megha/5_wav2letter/WAV_2_LETTER/wav2letter/tutorials/1-librispeech_clean/
--train=wav2letter/tutorials/output/data/train-clean-100
--valid=wav2letter/tutorials/output/data/dev-clean
--input=flac
--arch=network.arch
--tokens=wav2letter/tutorials/output/data/tokens.txt
--criterion=ctc
--lr=0.05
--lrcrit=0.006
--gamma=0.2
--momentum=0.8
--stepsize=1
--maxgradnorm=1.0
--replabel=2
--surround=|
--onorm=target
--sqnorm=true
--mfsc=true
--filterbanks=40
--nthread=4
--batchsize=4
--runname=deutsche_Combined_clean_trainlogs
--iter=100
--logtostderr=1

@jacobkahn kindly point out the mistake I have done. My GPU info is as below

:~/Desktop$ glxinfo -B name of display: :0 display: :0 screen: 0 direct rendering: Yes OpenGL vendor string: NVIDIA Corporation OpenGL renderer string: GeForce GTX 1060 6GB/PCIe/SSE2 OpenGL core profile version string: 4.5.0 NVIDIA 410.79 OpenGL core profile shading language version string: 4.50 NVIDIA OpenGL core profile context flags: (none) OpenGL core profile profile mask: core profile

OpenGL version string: 4.6.0 NVIDIA 410.79 OpenGL shading language version string: 4.60 NVIDIA OpenGL context flags: (none) OpenGL profile mask: (none)

OpenGL ES profile version string: OpenGL ES 3.2 NVIDIA 410.79 OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20

I am happy to provide any information needed, Kindly help me with this issue. I really want to start training soon :(

Thanks, Megha

megharangaswamy commented 5 years ago

As suggested in one of the thread, I kept only audio files with size less than 400KB for training. I edded up with below error

megha@megha-ERAZER-X4709-D-C519:/media/home/megha/5_wav2letter/WAV_2_LETTER/wav2letter/tutorials/1-librispeech_clean$ **/media/home/megha/5_wav2letter/WAV_2_LETTER/wav2letter/build/Train train --flagsfile /media/home/megha/5_wav2letter/WAV_2_LETTER/wav2letter/tutorials/1-librispeech_clean/train.cfg**
I0425 11:30:18.199692  9725 Train.cpp:139] Gflags after parsing 
--flagfile=; --fromenv=; --tryfromenv=; --undefok=; --tab_completion_columns=80; --tab_completion_word=; --help=false; --helpfull=false; --helpmatch=; --helpon=; --helppackage=false; --helpshort=false; --helpxml=false; --version=false; --am=; --arch=network.arch; --archdir=/media/home/megha/5_wav2letter/WAV_2_LETTER/wav2letter/tutorials/1-librispeech_clean/; --attention=content; --attnWindow=no; --batchsize=1; --beamscore=25; --beamsize=2500; --channels=1; --criterion=ctc; --datadir=/media/home/megha/5_wav2letter/WAV_2_LETTER/; --dataorder=input; --devwin=0; --emission_dir=; --enable_distributed=false; --encoderdim=0; --eostoken=false; --everstoredb=false; --fftcachesize=1; --filterbanks=40; --flagsfile=/media/home/megha/5_wav2letter/WAV_2_LETTER/wav2letter/tutorials/1-librispeech_clean/train.cfg; --forceendsil=false; --gamma=0.20000000000000001; --garbage=false; --input=flac; --inputbinsize=100; --inputfeeding=false; --iter=100; --itersave=false; --labelsmooth=0; --leftWindowSize=50; --lexicon=; --linlr=-1; --linlrcrit=-1; --linseg=0; --lm=; --lmtype=kenlm; --lmweight=1; --localnrmlleftctx=0; --localnrmlrightctx=0; --logadd=false; --lr=0.050000000000000003; --lrcrit=0.0060000000000000001; --maxdecoderoutputlen=200; --maxgradnorm=1; --maxisz=9223372036854775807; --maxload=-1; --maxrate=10; --maxsil=50; --maxtsz=9223372036854775807; --maxword=-1; --melfloor=1; --memstepsize=10485760; --mfcc=false; --mfcccoeffs=13; --mfsc=true; --minisz=0; --minrate=3; --minsil=0; --mintsz=0; --momentum=0.80000000000000004; --noresample=false; --nthread=1; --nthread_decoder=1; --onorm=target; --outputbinsize=5; --pctteacherforcing=100; --pcttraineval=100; --pow=false; --replabel=2; --reportiters=0; --rightWindowSize=50; --rndv_filepath=; --rundir=/media/home/megha/5_wav2letter/WAV_2_LETTER/; --runname=deutsche_Combined_clean_trainlogs; --samplerate=16000; --samplingstrategy=rand; --sclite=; --seed=0; --show=false; --showletters=false; --silweight=0; --skipoov=false; --smearing=none; --softwoffset=10; --softwrate=5; --softwstd=5; --sqnorm=true; --stepsize=1; --surround=|; --tag=; --target=tkn; --targettype=video; --test=; --tokens=wav2letter/tutorials/output/data/tokens.txt; --tokensdir=/media/home/megha/5_wav2letter/WAV_2_LETTER/; --train=wav2letter/tutorials/output/data/train-clean-100; --trainWithWindow=false; --transdiag=0; --unkweight=-inf; --valid=wav2letter/tutorials/output/data/dev-clean; --weightdecay=0; --wordscore=1; --world_rank=0; --world_size=1; --alsologtoemail=; --alsologtostderr=false; --colorlogtostderr=false; --drop_log_memory=true; --log_backtrace_at=; --log_dir=; --log_link=; --log_prefix=true; --logbuflevel=0; --logbufsecs=30; --logemaillevel=999; --logfile_mode=436; --logmailer=/bin/mail; --logtostderr=true; --max_log_size=1800; --minloglevel=0; --stderrthreshold=2; --stop_logging_if_full_disk=false; --symbolize_stacktrace=true; --v=0; --vmodule=; 
I0425 11:30:18.199710  9725 Train.cpp:141] Experiment path: /media/home/megha/5_wav2letter/WAV_2_LETTER/deutsche_Combined_clean_trainlogs
I0425 11:30:18.199714  9725 Train.cpp:142] Experiment runidx: 1
I0425 11:30:18.200085  9725 Train.cpp:160] Number of classes (network) = 35
I0425 11:30:18.200101  9725 Train.cpp:171] Loading architecture file from /media/home/megha/5_wav2letter/WAV_2_LETTER/wav2letter/tutorials/1-librispeech_clean/network.arch
I0425 11:30:18.877243  9725 Train.cpp:191] [Network] Sequential [input -> (0) -> (1) -> (2) -> (3) -> (4) -> (5) -> (6) -> (7) -> (8) -> (9) -> (10) -> (11) -> (12) -> (13) -> (14) -> (15) -> (16) -> (17) -> (18) -> (19) -> (20) -> output]
    (0): View (-1 1 40 0)
    (1): Conv2D (40->256, 8x1, 2,1, SAME,SAME) (with bias)
    (2): ReLU
    (3): Conv2D (256->256, 8x1, 1,1, SAME,SAME) (with bias)
    (4): ReLU
    (5): Conv2D (256->256, 8x1, 1,1, SAME,SAME) (with bias)
    (6): ReLU
    (7): Conv2D (256->256, 8x1, 1,1, SAME,SAME) (with bias)
    (8): ReLU
    (9): Conv2D (256->256, 8x1, 1,1, SAME,SAME) (with bias)
    (10): ReLU
    (11): Conv2D (256->256, 8x1, 1,1, SAME,SAME) (with bias)
    (12): ReLU
    (13): Conv2D (256->256, 8x1, 1,1, SAME,SAME) (with bias)
    (14): ReLU
    (15): Conv2D (256->256, 8x1, 1,1, SAME,SAME) (with bias)
    (16): ReLU
    (17): Reorder (2,0,3,1)
    (18): Linear (256->512) (with bias)
    (19): ReLU
    (20): Linear (512->35) (with bias)
I0425 11:30:18.877285  9725 Train.cpp:192] [Network Params: 3903523]
I0425 11:30:18.877290  9725 Train.cpp:193] [Criterion] ConnectionistTemporalClassificationCriterion
I0425 11:30:18.877542  9725 NumberedFilesLoader.cpp:29] Adding dataset /media/home/megha/5_wav2letter/WAV_2_LETTER/wav2letter/tutorials/output/data/train-clean-100 ...
I0425 11:30:18.877698  9725 NumberedFilesLoader.cpp:68] 364 files found. 
I0425 11:30:18.881237  9725 Utils.cpp:102] Filtered 0/364 samples
I0425 11:30:18.881323  9725 W2lNumberedFilesDataset.cpp:57] Total batches (i.e. iters): 364
I0425 11:30:18.881399  9725 NumberedFilesLoader.cpp:29] Adding dataset /media/home/megha/5_wav2letter/WAV_2_LETTER/wav2letter/tutorials/output/data/dev-clean ...
I0425 11:30:18.881530  9725 NumberedFilesLoader.cpp:68] 5181 files found. 
I0425 11:30:18.925501  9725 Utils.cpp:102] Filtered 0/5181 samples
I0425 11:30:18.926044  9725 W2lNumberedFilesDataset.cpp:57] Total batches (i.e. iters): 5181
I0425 11:30:19.020403  9725 Train.cpp:441] Shuffling trainset
I0425 11:30:19.020476  9725 Train.cpp:448] Epoch 1 started!
terminate called after throwing an instance of 'std::invalid_argument'
  what():  CUDNN_STATUS_BAD_PARAM
*** Aborted at 1556184624 (unix time) try "date -d @1556184624" if you are using GNU date ***
PC: @     0x7f004681e428 gsignal
*** SIGABRT (@0x3e8000025fd) received by PID 9725 (TID 0x7f00c6178800) from PID 9725; stack trace: ***
    @     0x7f004681e4b0 (unknown)
    @     0x7f004681e428 gsignal
    @     0x7f004682002a abort
    @     0x7f004738384d __gnu_cxx::__verbose_terminate_handler()
    @     0x7f00473816b6 (unknown)
    @     0x7f0047381701 std::terminate()
    @     0x7f0047381919 __cxa_throw
    @           0x60a17d fl::TensorDescriptor::TensorDescriptor()
    @           0x608927 fl::conv2d()
    @           0x5e9836 fl::Conv2D::forward()
    @           0x5f7b9f fl::UnaryModule::forward()
    @           0x5e8832 fl::Sequential::forward()
    @           0x45e228 _ZZZ4mainENKUlSt10shared_ptrIN2fl6ModuleEES_IN3w2l17SequenceCriterionEES_INS3_10W2lDatasetEERNS0_19FirstOrderOptimizerES9_biE4_clES2_S5_S7_S9_S9_biENKUllddE0_clEldd
    @           0x45ff06 _ZZ4mainENKUlSt10shared_ptrIN2fl6ModuleEES_IN3w2l17SequenceCriterionEES_INS3_10W2lDatasetEERNS0_19FirstOrderOptimizerES9_biE4_clES2_S5_S7_S9_S9_bi.constprop.8756
    @           0x418f72 main
    @     0x7f0046809830 __libc_start_main
    @           0x45bb19 _start
Aborted
jacobkahn commented 5 years ago

@megharangaswamy — are you still seeing this? This could indicate a bunch of different things. Based on https://github.com/facebookresearch/wav2letter/issues/304, it seems like you've solved this, yes?

vineelpratap commented 5 years ago

Hi, since the token size is 35, I'm assuming you are using a custom dataset. I would try adding --minisz 25 flag and see if it helps.

The error is being thrown in Conv2D layer and it most likely because number of frames are zero since the cofiguration of architecture look correct.