Disclaimer: my implementation right now is unstable (you ca refer to the learning curve below), I'm not sure if it's my problems. All comments are welcomed and feel free to contact me!
This code aims to solve some control problems, espicially in Mujoco, and is highly based on pytorch-a3c. What's difference between this repo and pytorch-a3c:
Note that this repo is only compatible with Mujoco in OpenAI gym. If you want to train agent in Atari domain, please refer to pytorch-a3c.
There're three tasks/modes for you: train, eval, develop.
python main.py --env-name InvertedPendulum-v1 --num-processes 16 --task train
python main.py --env-name InvertedPendulum-v1 --task eval --display True --load_ckpt ckpt/a3c/InvertedPendulum-v1.a3c.100
You can choose to display or not using display flags
python main.py --env-name InvertedPendulum-v1 --num-processes 16 --task develop
In some case that you want to check if you code runs as you want, you might resort to pdb
. Here, I provide a develop mode, which only runs in one thread (easy to debug).
The plot of total reward/episode length in 1000 steps:
In InvertedPendulum-v1, total reward exactly equal to episode length.
Note that the x axis denote the time in minute
The above curve is plotted from python plot.py --log_path ./logs/a3c/InvertedPendulum-v1.a3c.log
<img src="http://img.youtube.com/vi/E7QlRIkKuXo/0.jpg" alt="IMAGE ALT TEXT HERE" width="480" height="360" border="10" />
<img src="http://img.youtube.com/vi/WNiitHoz8x4/0.jpg" alt="IMAGE ALT TEXT HERE" width="480" height="360" border="10" />
I implement the ShareRMSProp in my_optim.py
, but I haven't tried it yet.