dgriff777 / rl_a3c_pytorch

A3C LSTM Atari with Pytorch plus A3G design
Apache License 2.0
562 stars 119 forks source link

How long have you trained the model #15

Closed 1601214542 closed 6 years ago

1601214542 commented 7 years ago

cool project! But how long have you trained for each Atari game. I have trained the SpaceInvader-v0 for 13 hours, with 16 cpus but the reward is still in 790. However, according to the original paper, when the hour is 14, the performance has reached 1400. How can I train faster. Another problem is that the network architecture is different from the original paper proposed. Does that affect the performance ? Thank you!

dgriff777 commented 7 years ago

Thanks! Well first the v0 version is much harder environment to train than the one in paper as actions are randomly repeated 0.25 of the time and the frames are stochastically sampled from 2-4. To match environment of paper use SpaceInvadersDeterministic-v4 as that seems closes to what was in paper.

Also set set args of --count-lives True As lives are episodic in paper and particularly helps with space invaders.

yes architecture is different. I tried to make improvements where I could to it. Its larger network to tackle the particularly hard v0 environments. Its performs better than model in paper

1601214542 commented 7 years ago

Thanks! And I want to share some comments given by Dr. Mnih (author of the paper) https://github.com/muupan/async-rl/wiki. The learning rate decreases linearly.

Besides, I have found a little error(maybe) in 73 line of shared_optim.py where "avg = squareavg.sqrt().add(group['eps'])" may be "avg = squareavg.add(group['eps']).sqrt()”

And can you tell me where to find the corresponding game version of the original paper ? For example, in this paper, it demonstrates the performance of SpaceInvader, but for what version, v0 or SpaceInvadersDeterministic-v4. Where can I find information like those. Thank you!

dgriff777 commented 7 years ago

'''avg = squareavg.sqrt().add(group['eps'])" may be "avg = squareavg.add(group['eps']).sqrt()'''

Its correct as it is. Eps should be on the outside as its there to avoid dividing by zero

Here is deep mind ale wrap. should give you info on their implementation https://github.com/deepmind/alewrap/blob/master/alewrap/GameEnvironment.lua

though they have not bees super consistent on their setup from paper to paper