This A2C RL agent is based on the Asynchronous A2C (A3C) agent in Deep Reinforcement Learning in Action, but with tuned hyperparameters, and without asynchronous processing.
A2C agents combine a Deep Q-network (DQN) like that used by DeepMind with a policy network like REINFORCE. They provide direct sampling of actions from a distribution (like a policy network) whilst also supporting rapid online learning (like a DQN, but without the need for experience replay or a target network).
The A2C agent learns to play the Cart Pole game environment in Gymnasium:
OpenAI Gym, OpenAI, 2022
The agent is a two-headed feed-forward neural network:
Deep Reinforcement Learning in Action, Manning, 2020
Here the agent is being trained to play Cart Pole.
And here the trained agent is playing the game unaided:
The A2C agent can be used to play any game as long as main.py
is updated to initialise the agent and correctly handle rewards. There is a branch of this repository that shows how it can be successfully trained to play the Lunar Lander game in Gymnasium.
Although A3C and PPO agents can perform better than A2C agents, they include additional complexity that makes the fundamentals of RL more difficult to understand when looking at the code. This A2C agent is designed to be a reference for how to implement a Deep RL (DRL) agent using PyTorch. If you want a PPO agent, I recommend using the implementation in Stable Baselines 3. There is a branch of this repository that shows how to implement an equivalent A2C agent using Stable Baselines 3, and to convert this agent to a PPO agent, simply replace instances of A2C
with instances of PPO
.
Prerequisites: Python 3.10
. install.sh
or (Windows):
install.bat
(venv) >python -m actorcritic --train --render