Closed 1601214542 closed 6 years ago
Thanks! Well first the v0 version is much harder environment to train than the one in paper as actions are randomly repeated 0.25 of the time and the frames are stochastically sampled from 2-4. To match environment of paper use SpaceInvadersDeterministic-v4 as that seems closes to what was in paper.
Also set set args of --count-lives True As lives are episodic in paper and particularly helps with space invaders.
yes architecture is different. I tried to make improvements where I could to it. Its larger network to tackle the particularly hard v0 environments. Its performs better than model in paper
Thanks! And I want to share some comments given by Dr. Mnih (author of the paper) https://github.com/muupan/async-rl/wiki. The learning rate decreases linearly.
Besides, I have found a little error(maybe) in 73 line of shared_optim.py where "avg = squareavg.sqrt().add(group['eps'])" may be "avg = squareavg.add(group['eps']).sqrt()”
And can you tell me where to find the corresponding game version of the original paper ? For example, in this paper, it demonstrates the performance of SpaceInvader, but for what version, v0 or SpaceInvadersDeterministic-v4. Where can I find information like those. Thank you!
'''avg = squareavg.sqrt().add(group['eps'])" may be "avg = squareavg.add(group['eps']).sqrt()'''
Its correct as it is. Eps should be on the outside as its there to avoid dividing by zero
Here is deep mind ale wrap. should give you info on their implementation https://github.com/deepmind/alewrap/blob/master/alewrap/GameEnvironment.lua
though they have not bees super consistent on their setup from paper to paper
cool project! But how long have you trained for each Atari game. I have trained the SpaceInvader-v0 for 13 hours, with 16 cpus but the reward is still in 790. However, according to the original paper, when the hour is 14, the performance has reached 1400. How can I train faster. Another problem is that the network architecture is different from the original paper proposed. Does that affect the performance ? Thank you!