NVlabs / GA3C

Hybrid CPU/GPU implementation of the A3C algorithm for deep reinforcement learning.
BSD 3-Clause "New" or "Revised" License
652 stars 195 forks source link

Lstm nativ #25

Closed etienne87 closed 7 years ago

etienne87 commented 7 years ago

re-merged my changes with last version of GA3C

current results (by setting Config.USE_RNN to False, True, & True + summing lstm_state & cnn_state as a residual connection) we can see lstm is not so helpful in Pong Case, as the problem is for most part completely observable

results txt

EDIT: bug in code, logits_v & logits_p still take dense layer as input, rnn is sent nowhere, needs to restart experiment.

4SkyNet commented 7 years ago

I don't think so that lstm isn't helpful for Pong (wrt my experiments with vanilla a3c), but if you follow with some straightforward way you can't see some improvements. For example: use the same stacked state with small internal update loop. It's better to increase this loop at least for 20 steps and do not try to stack images or you can stack them and therefore you can a bit decrease this loop.

So... I don't inspect code too much >> it should be TIME_MAX = 5

etienne87 commented 7 years ago

@4SkyNet : ok will try with TIME_MAX=20. Do you if standard nn regularization techniques (weight decay, dropout, batchnorm) can affect convergence and/ or performance?

etienne87 commented 7 years ago

following results on Pong : results txt

4SkyNet commented 7 years ago

@etienne87 I don't experiment with them in a3c context, but I know that ddpg uses batchnorm for improvements. and also thx for experiments >> lstm_20 starting earlier but then behave not so steepest as old on 'ff' ...I don't really know, but it could be if you still stack the images ...you can also try elu instead of relu I think

etienne87 commented 7 years ago

i close this attempt as something is not right with my code.