farizrahman4u / qlearning4k

Q-learning for Keras
MIT License
385 stars 84 forks source link

NaN Loss on train #13

Open bglick13 opened 7 years ago

bglick13 commented 7 years ago

I have been getting NaN losses when I try to train my agent on a game. I tracked it back to the get_batch function in memory - Y (the output of the model's predictions) turns to all NaNs about halfway through the first epoch. I haven't been able to figure it out from there, though.

Any suggestion would be much appreciated. This package is fantastic!

farizrahman4u commented 7 years ago

Please specify the model you are using.

bglick13 commented 7 years ago

Hi, thank you for the response.

I am using essentially the example from the read me, but with my own game. I figured out if I lower the learning rate dramatically to .001, it fixes the problem. Could there be something in the way I've designed my game that would cause the discrepancy? I'd rather use a slightly larger learning rate if possible.

Thanks again

farizrahman4u commented 7 years ago

Some sort of normalization of your reward might help. Paste your code here, I will take a look.

IntQuant commented 7 years ago

I found that in logs before loss became nan

Epoch 143/1000 | Loss 2214.4102 | Epsilon 0.00 | Win count 65 Epoch 144/1000 | Loss 6051275231349243379712.0000 | Epsilon 0.00 | Win count 66 Epoch 145/1000 | Loss 7.3589 | Epsilon 0.00 | Win count 67 Epoch 146/1000 | Loss 11.0253 | Epsilon 0.00 | Win count 68 Epoch 147/1000 | Loss 33.1732 | Epsilon 0.00 | Win count 68 Epoch 148/1000 | Loss 32.7043 | Epsilon 0.00 | Win count 68 Epoch 149/1000 | Loss 3.5222 | Epsilon 0.00 | Win count 69 /usr/local/lib/python3.5/dist-packages/qlearning4k/memory.py:56: RuntimeWarning: invalid value encountered in multiply targets = (1 - delta) Y[:batch_size] + delta (r + gamma (1 - game_over) Qsa) Epoch 150/1000 | Loss nan | Epsilon 0.00 | Win count 69

code.py.zip