Closed the01 closed 5 years ago
Please reduce LR to 0.01 and turn off LARC. What was your training ""num_epochs"?
I assume removing "larc_params" from base_params turns it off? I believe it was 200 or 250. Do I need a higher value? For how many GPUs is this setting intended?
This learning rate is for 2 GPUs. Yes, you can remove larc parameters.
Ok, I changed LR, removed larc_params and turned horovod off, but it now throws W tensorflow/core/util/ctc/ctc_loss_calculator.cc:144] No valid path found.
errors. Do I need to compile tensorflow myself and use the ctc with lm?
I noticed that the Wave2LetterEncoder
was changed to TDNNEncoder
.
hi @borisgin i am also getting this warning tensorflow/core/util/ctc/ctc_loss_calculator.cc:144] No valid path found but i am positive that i compiled the tensorflow with the kenlm configuration as specified in the docs.
does this affect training or this is the warning just during when the output is decoded for the transcription ?
Thanks
Tensorflow is ok, this means that there training diverged, or that maybe audio sequence is too short
thanks for the reply @borisgin so does that means my model is not being trained ? and also I have a huge dataset of audio files and my system have 64Gb of ram and two nvidia 1080TI using the OS ubuntu 16.04. But i have noticed that when i try to run the training on multigpu i.e( by using the numgpus in the config=2) the training is only stuck at the initializing state and the model is loaded on both of my gpus but the usage of both the gpu sits at 0%.
thanks.
I was attempting to reproduce the results of the Wave2Letter+ model from the docu using this config as basis. I only have 2 Titan X available, so I couldn't use mixed-precision and changed the number of GPUs to 2. I was only able to achieve a WER of around 16%, far from the reported 6.67%.
I assume I need a different learning rate/.. for training with 1, 2, 4, .. GPUs? What configuration should I use to get similar result?