training - logs - Githubissues

Hi ,

I started my training today. below are my command for training and train.cfg respectively. Train command :

/media/home/megha/5_wav2letter/WAV_2_LETTER/wav2letter/build/Train train --flagsfile /media/home/megha/5_wav2letter/WAV_2_LETTER/wav2letter/tutorials/1-librispeech_clean/train.cfg

train.cfg

# Training config for Mini Librispeech
# Replace `[...]` with appropriate paths
--datadir=/media/home/megha/5_wav2letter/WAV_2_LETTER/
--tokensdir=/media/home/megha/5_wav2letter/WAV_2_LETTER/
--rundir=/media/home/megha/5_wav2letter/WAV_2_LETTER/
--archdir=/media/home/megha/5_wav2letter/WAV_2_LETTER/wav2letter/tutorials/1-librispeech_clean/
--train=wav2letter/tutorials/output/data/train-clean-100
--valid=wav2letter/tutorials/output/data/dev-clean
--input=flac
--arch=network.arch
--tokens=wav2letter/tutorials/output/data/tokens.txt
--criterion=ctc
--lr=0.1
--maxgradnorm=1.0
--replabel=2
--surround=|
--onorm=target
--sqnorm=true
--mfsc=true
--filterbanks=40
--nthread=4
--batchsize=4
--runname=librispeech_clean_trainlogs
--iter=100

1) It's been 3 hours and I don't see any logs on my console. But there was a folder created called librispeech_clean_trainlogs . Is this the only place to see logs?

2) I want to make sure I have started my training correctly and the reason that I dont see any logs on console is not because of some issue my system is in infine loop or some thing. Can you please confirm.

3) And in case I stop my trainng, can I continue it from the place I left. Is there any checkpoint options available?

4) For the 1-Librispeech_clean data set, what is the expected time for training?

Thanks :)

@megharangaswamy — can you try setting the -reportiters flag to a low value (like 1)? This will log output after each iteration, and you'll be able to detect if there's any progress training at all. I'd also try adding --logtostderr=1 (sometimes, glog doesn't flush output to the console properly).

That said, Librispeech is a big dataset/training can be quite slow. If you're training on a CPU with a big model, it may take several hours to complete an epoch.
You can also inspect the training loop manually by adding logging here as samples are processed.
You can continue training by using the continue mode with Train (rather than the train mode, which starts training from scratch).
This has massive variance depending on your hardware.

flashlight / wav2letter

training - logs #160