Zeta36 / Asynchronous-Methods-for-Deep-Reinforcement-Learning

Using a paper from Google DeepMind I've developed a new version of the DQN using threads exploration instead of memory replay as explain in here: http://arxiv.org/pdf/1602.01783v1.pdf I used the one-step-Q-learning pseudocode, and now we can train the Pong game in less than 20 hours and without any GPU or network distribution.
84 stars 31 forks source link

Training in process/core level parallelism #3

Open thisiscam opened 8 years ago

thisiscam commented 8 years ago

Hi @Zeta36

Great project! I'm trying to run some experiments with the code. It seems that currently the code uses threading with tensorflow, and from my observation, the training loop is not really in full parallel because of running on threads instead of processes. I think ideally, each learner should be on a different process to fully utilize a modern machine.

This might be relevant: http://stackoverflow.com/questions/34900246/tensorflow-passing-a-session-to-a-python-multiprocess

But it looks like bad news, I can't just spawn a bunch of processes and let them share the same tensorflow session. So maybe a distributed tensorflow session is what we need: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/g3doc/how_tos/distributed/index.md

ahundt commented 7 years ago

I've run this as well and as @thisiscam mentions, it doesn't appear to actually run in parallel with good utilization. When I run the program most python threads are at 5% core utilization except for one thread which is at 97% utilization, this means that collectively only about 2 cores are actually in use.

ahundt commented 7 years ago

@thisiscam distributed tensorflow as per your link is across many physical machines networked together, before that approach is taken it is important to completely utilize the capabilities of a single machine.

ahundt commented 7 years ago

The threading mechanism and queues is more likely to be the right way to go: https://github.com/tensorflow/tensorflow/blob/master/tensorflow/g3doc/how_tos/threading_and_queues/index.md

ahundt commented 7 years ago

@thisiscam I saw you made some changes in a branch here: https://github.com/thisiscam/Asynchronous-Methods-for-Deep-Reinforcement-Learning/tree/ale

But it looks like you forgot to add a file for some of the functions like load_ale() which is simply not present.

thisiscam commented 7 years ago

load_ale comes from python_ale_interface https://github.com/bbitmaster/ale_python_interface

However, from my experiments I have not yet tuned a good enough parameter that works. It might be due to some bug in the code