andreimuntean / A3C

Deep reinforcement learning using an asynchronous advantage actor-critic (A3C) model.
66 stars 24 forks source link

Thanks for the code what commands .. #2

Closed mcbrs1a closed 6 years ago

mcbrs1a commented 6 years ago

WHat commands are required to run a small basic example. I have installed the pre-reqisites in an anaconda enviroment, but new to this process. So run ./train.sh

But how do I visualize what is happening, is it possible to perform simple Q learning for aari games on a CPU with this code?

andreimuntean commented 6 years ago

Is it possible to perform simple Q-learning for Atari games on a CPU with this code?

A3C is a technique that's similar to Q-learning (but more complicated, in my opinion). Either way, this AI can certainly learn to play Atari games. Try Pong-v0 and see the neat results within a few hours. As for the last part of your question; this project runs only on CPUs. If you're interested in deep Q-learning with GPUs (although that also works with CPUs), see here.

But how do I visualize what is happening?

I made the train.sh script start a tmux environment. Use tmux attach -t a3c to enter the environment and monitor the agents and tmux kill-session -t a3c to stop the environment (and training). The script also starts TensorBoard on port 15000. So while training, you can go to http://localhost:15000 and see plots of how the agents are performing.

If you want to see the agents actually playing the games, run ./train.sh --render. Note that this significantly slows down training, so I suggest just running ./train.sh then occasionally stopping and running ./train.sh --render to see how much they've improved.

One more thing: OpenAI Gym is constantly changing so make sure that you've installed gym version 0.8 and TensorFlow 1.0 -- older versions will not work with this project and newer versions very likely don't work either.

mcbrs1a commented 6 years ago

Thanks for the very helpful reply. I am running the a3c example. You mention to occasionally stop the running to see how things have improved, does this mean killing the session. Are the results somehow stored if I do this?

Also ./train.sh --render doesn't give me any visual, but does start the training process, am I missing something?

Sorry if this is basic

andreimuntean commented 6 years ago

The results are stored in the "models" directory, which is created after running ./train.sh and periodically updated. Closing and running ./train.sh again will continue from the last checkpoint.

To my surprise, ./train.sh --render causes a parsing exception. You can see the parsing exception (thread.py: error: argument --render: expected one argument) by looking into the tmux instance using tmux attach -t a3c.

Looks like the right way of enabling the --render option is by also assigning a value to it, for example: ./train.sh --render=1.