alpine-chamois/actor-critic

Advantage Actor-Critic (A2C) Reinforcement Learning (RL) Agent

This A2C RL agent is based on the Asynchronous A2C (A3C) agent in Deep Reinforcement Learning in Action, but with tuned hyperparameters, and without asynchronous processing.

A2C agents combine a Deep Q-network (DQN) like that used by DeepMind with a policy network like REINFORCE. They provide direct sampling of actions from a distribution (like a policy network) whilst also supporting rapid online learning (like a DQN, but without the need for experience replay or a target network).

The A2C agent learns to play the Cart Pole game environment in Gymnasium:

Agent-environment loop

OpenAI Gym, OpenAI, 2022

The agent is a two-headed feed-forward neural network:

A2C model

Deep Reinforcement Learning in Action, Manning, 2020

Here the agent is being trained to play Cart Pole.

Training metrics

And here the trained agent is playing the game unaided:

Evaluations

What about playing other games?

The A2C agent can be used to play any game as long as main.py is updated to initialise the agent and correctly handle rewards. There is a branch of this repository that shows how it can be successfully trained to play the Lunar Lander game in Gymnasium.

Why A2C and not A3C or PPO?

Although A3C and PPO agents can perform better than A2C agents, they include additional complexity that makes the fundamentals of RL more difficult to understand when looking at the code. This A2C agent is designed to be a reference for how to implement a Deep RL (DRL) agent using PyTorch. If you want a PPO agent, I recommend using the implementation in Stable Baselines 3. There is a branch of this repository that shows how to implement an equivalent A2C agent using Stable Baselines 3, and to convert this agent to a PPO agent, simply replace instances of A2C with instances of PPO.

Getting started

Prerequisites: Python 3.10

Run the install script (Linux):
```
. install.sh
```
or (Windows):
```
install.bat
```

Run the example:

(venv) >python -m actorcritic --train --render