Getting Loss as NaN - Githubissues

baidu-research / ba-dls-deepspeech

Apache License 2.0

486 stars 174 forks source link

Getting Loss as NaN #20

Closed Nishanksingla closed 7 years ago

Nishanksingla commented 7 years ago

Hi,

I am getting the loss as NaN after 110 iterations in epoch 1. Can anyone please provide me a good configuration(like good values of hyperparameter or learning rate) that results in good model accuracy.

Also, I was thinking of using 5k iteration model given in repo for transfer learning. Can anyone please tell me how can I use transfer learning in training this model. Any help would be much appreciated. :)

a00achild1 commented 7 years ago

@Nishanksingla Did you train on Librispeech database? If yes, I was train with the following parameters: #layers: 7, #nodes: 1000, base_lr: 1e-4, clipnorm: 200 These parameters will make the loss converge.

But I have no idea how to use transfer learning. I would like to know, too.

Nishanksingla commented 7 years ago

@a00achild1 Thank you for the reply and sorry for replying so late.

Yes, I am using Librispeech database. Do you know if it is possible to use all the GPU cards to train the model so that it will converge fast. Like in caffe, there is "-gpu all" option for the training the model.

srvinay commented 7 years ago

Using multiple GPUs isn't supported with the Theano backend through Keras. You could switch to tensorflow (see: https://www.tensorflow.org/tutorials/using_gpu#using_multiple_gpus)

stan-sack commented 7 years ago

@a00achild1 after how many iterations did you see convergence? I'm at 1400 iterations training on Librispeech-clean-100 (keras optimizer) with your parameters and my loss looks like this:

plot