ChuaCheowHuan / reinforcement_learning

My reproduction of various reinforcement learning algorithms (DQN variants, A3C, DPPO, RND with PPO) in Tensorflow.
https://chuacheowhuan.github.io/
MIT License
36 stars 11 forks source link
a3c ddqn distributed-tensorflow dppo dqn dueling-ddqn lstm n-step n-step-return n-step-target per ppo random-network-distillation reinforcement-learning rl rnd rnd-ppo tensorflow

What's in this repository?

This repository contains codes that I have reproduced (while learning RL) for various reinforcement learning algorithms. The codes were tested on Colab.

If Github is not loading the Jupyter notebooks, a known Github issue, click here to view the notebooks on Jupyter's nbviewer.


Implemented Algorithms

Algorithms Discrete Continuous Multithreaded Multiprocessing Tested on
DQN :heavy_check_mark: CartPole-v0
Double DQN (DDQN) :heavy_check_mark: CartPole-v0
Dueling DDQN :heavy_check_mark: CartPole-v0
Dueling DDQN + PER :heavy_check_mark: CartPole-v0
A3C (1) :heavy_check_mark: :heavy_check_mark: :heavy_check_mark: :heavy_check_mark:(3) CartPole-v0, Pendulum-v0
DPPO (2) :heavy_check_mark: :heavy_check_mark:(3) Pendulum-v0
RND + PPO :heavy_check_mark: MountainCarContinuous-v0 (4), Pendulum-v0 (5)

(1): N-step returns used for critic's target.
(2): GAE used for computation of TD lambda return (for critic's target) & policy's advantage.
(3): Distributed Tensorflow & Python's multiprocessing package used.
(4): State featurization (approximates feature map of an RBF kernel) is used.
(5): Fast-slow LSTM with an overly simplified VAE like "variational unit" (VU) is used.


misc folder

The misc folder contains related example codes that I have put together while learning RL. See the README.md in the misc folder for more details.


Blog

Check out my blog for more information on my repositories.