Grzego / async-rl

Variation of "Asynchronous Methods for Deep Reinforcement Learning" with multiple processes generating experience for agent (Keras + Theano + OpenAI Gym)[1-step Q-learning, n-step Q-learning, A3C]
MIT License
44 stars 12 forks source link

Little glitch with play.py #3

Closed mklissa closed 7 years ago

mklissa commented 7 years ago

Hi there, awesome work! I noticed that play.py doesn't compile due to a shape difference between the policy and the action space in your choose_action() function. However, by limiting the size of the action space to 4 actions, the script works fine! By the way, how long does it take to achieve results with your algorithm (and on what machine)?

lucasliunju commented 7 years ago

Hi,

Thanks for your wonderful work! I have a question that the action space is continuous or discrete.

Thanks!

Grzego commented 7 years ago

Hi,

@mklissa yes there might be a problem with action_space when using my pretrained models. Because I coded it on Windows, to make atari_py and OpenAI Gym work I needed to compile ALE (Arcade-Learning-Environment) and as it turn out later OpenAI made few changes in their ALE code that I wasn't aware of (like for example action_space in Breakout, in original ALE there are 4 actions, in atari_py there are 6 actions). But if you decide to train model by yourself there should be no problems with action_space.

I run it on i5-6600K CPU and GTX 1080 GPU (usage note: ~100% CPU and ~35% GPU) and for A3C it took 23 hours to process 80 million frames (I used 4 processes). As for results, last time I tested it in Breakout, it scored about 10 points after 6 million frames and about 30 points after 10 million frames (what took about 3 hours).

@lucasliunju it works with discrete action spaces.

Thanks!

mklissa commented 7 years ago

Hi Grzego,

Thank you for your detailed answer. That does make sense! I can confirm that the model works well without any issues and that by running it I have seen results just as good as described.

Great work!