Closed mcbrs1a closed 6 years ago
Is it possible to perform simple Q-learning for Atari games on a CPU with this code?
A3C is a technique that's similar to Q-learning (but more complicated, in my opinion). Either way, this AI can certainly learn to play Atari games. Try Pong-v0 and see the neat results within a few hours. As for the last part of your question; this project runs only on CPUs. If you're interested in deep Q-learning with GPUs (although that also works with CPUs), see here.
But how do I visualize what is happening?
I made the train.sh script start a tmux environment. Use tmux attach -t a3c
to enter the environment and monitor the agents and tmux kill-session -t a3c
to stop the environment (and training). The script also starts TensorBoard on port 15000. So while training, you can go to http://localhost:15000 and see plots of how the agents are performing.
If you want to see the agents actually playing the games, run ./train.sh --render
. Note that this significantly slows down training, so I suggest just running ./train.sh
then occasionally stopping and running ./train.sh --render
to see how much they've improved.
One more thing: OpenAI Gym is constantly changing so make sure that you've installed gym version 0.8 and TensorFlow 1.0 -- older versions will not work with this project and newer versions very likely don't work either.
Thanks for the very helpful reply. I am running the a3c example. You mention to occasionally stop the running to see how things have improved, does this mean killing the session. Are the results somehow stored if I do this?
Also ./train.sh --render doesn't give me any visual, but does start the training process, am I missing something?
Sorry if this is basic
The results are stored in the "models" directory, which is created after running ./train.sh
and periodically updated. Closing and running ./train.sh
again will continue from the last checkpoint.
To my surprise, ./train.sh --render
causes a parsing exception. You can see the parsing exception (thread.py: error: argument --render: expected one argument
) by looking into the tmux instance using tmux attach -t a3c
.
Looks like the right way of enabling the --render
option is by also assigning a value to it, for example: ./train.sh --render=1
.
WHat commands are required to run a small basic example. I have installed the pre-reqisites in an anaconda enviroment, but new to this process. So run ./train.sh
But how do I visualize what is happening, is it possible to perform simple Q learning for aari games on a CPU with this code?