flashlight / wav2letter

Facebook AI Research's Automatic Speech Recognition Toolkit
https://github.com/facebookresearch/wav2letter/wiki
Other
6.39k stars 1.01k forks source link

Issue on model fine tuning:sota/2019/librispeech #761

Open Rootian opened 4 years ago

Rootian commented 4 years ago

Hi, I'm using the fork command on am_resnet_ctc_librispeech_dev_other.bin to adapt the model to my own dataset, and i got the following errors which says Loss has NaN values.

I0723 11:44:53.263063  2870 W2lListFilesDataset.cpp:147] 2703 files found.
I0723 11:44:53.263103  2870 Utils.cpp:102] Filtered 37/2703 samples
I0723 11:44:53.263499  2870 W2lListFilesDataset.cpp:62] Total batches (i.e. iters): 2666
I0723 11:44:53.267585  2870 Train.cpp:564] Shuffling trainset ::: 28515
I0723 11:44:53.271889  2870 Train.cpp:571] Epoch 1 started!
F0723 11:44:54.493821  2870 Train.cpp:616] Loss has NaN values. Samples - train-clean-100-4014-186175-0018
*** Check failure stack trace: ***
    @     0x7efd5faca0cd  google::LogMessage::Fail()
    @     0x7efd5facbf33  google::LogMessage::SendToLog()
    @     0x7efd5fac9c28  google::LogMessage::Flush()
    @     0x7efd5facc999  google::LogMessageFatal::~LogMessageFatal()
    @     0x7efd6a6d3ce7  _ZZ4mainENKUlSt10shared_ptrIN2fl6ModuleEES_IN3w2l17SequenceCriterionEES_INS3_10W2lDatasetEES_INS0_19FirstOrderOptimizerEES9_ddblE3_clES2_S5_S7_S9_S9_ddbl
    @     0x7efd6a668ca8  main
    @     0x7efd5edafb97  __libc_start_main
    @     0x7efd6a6cd10a  _start
Aborted (core dumped)
root@fc6464776c28:~/wav2letter#

i tried to debug the source code, the audio samples and list file were read into the trainset successfully could you help me find out the problem?

and here is my train.cfg:

root@fc6464776c28:~/wav2letter# cat train-office.cfg
# Training config for Mini Librispeech
# Replace `[...]` with appropriate paths
--datadir=/root/wav2letter/
--rundir=/root/wav2letter/training/
--archdir=/root/wav2letter/pre_model/
--train=lists/train-clean-100.lst
--valid=lists/dev-clean.lst
--input=wav
--arch=am_resnet_ctc.arch
--tokensdir=/root/wav2letter/pre_model
--tokens=librispeech-train-all-unigram-10000.tokens
--lexicon=/root/wav2letter/pre_model/librispeech-train+dev-unigram-10000-nbest10.lexicon
--criterion=ctc
--wordseparator=_
--usewordpiece=true
--sampletarget=0.1
--lr=0.4
--linseg=0
--maxgradnorm=1.0
--replabel=1
--surround=|
--onorm=target
--sqnorm=true
--mfsc=true
--filterbanks=40
--lrcosine
--nthread=4
--batchsize=1
--runname=talk51_trainlogs
--iter=500
--mintsz=2
--minisz=2

i've tried setting --iter to 10000000 or set other params as train_am_transformer_ctc.cfg in sota/2019/librispeech, but i still got the same error

tlikhomanenko commented 4 years ago

Are you running this on librispeech data (because in config your train data are specified as train-clean-100.lst)? Could you show the running command itself and the full log after you run the command (seems you are training from scratch, not finetuning the model)?

Dr-AyanDebnath commented 4 years ago

Am I able to use audio samples in wav format for CTC criterion? In example, it was shown that flac is used for CTC, so my question is can I use wav for CTC ? @tlikhomanenko

tlikhomanenko commented 4 years ago

Yep, wav format is supported, feel free to use it (for example TIMIT recipe with wav files)

Dr-AyanDebnath commented 4 years ago

I solved Loss has NaN values issue by reducing lr to 0.001 @Rootian link for reference: https://github.com/facebookresearch/wav2letter/issues/334

Thanks @tlikhomanenko I will try training using ctc criterion for wav file