Combining PPO agent with gym_trading_env

ClementPerroud / Rainbow-Agent

Replicate of Reinforcement Learning Rainbow with Tensorflow 2 from paper "Rainbow: Combining Improvements in Deep Reinforcement Learning"

MIT License

8 stars 6 forks source link

I tested the vanilla Gym-Trading environment (i.e., not vector, not multi-db) with actions = [-1, 0, 1], window = 10, training steps = [25k, 50k, 75k, 100k] on 5 synthetic datasets (11k intervals, pos/neg slope + sine perturbations (2x) + various levels of white noise, 90/10 train/validation split) using off-the-shelf StableBaselines3 configurations for A2C, ARS, DQN, PPO, QR-DQN, RecurrentPPO and TRPO. Best performance came from QR-DQN, A2C and DQN. PPO was middle of the pack. So, while not an answer to your question as no tests were optimized, directionally interesting to see simpler A2C beat PPO and RecurrentPPO (with its LSTM policy) perform near bottom.

ClementPerroud / Rainbow-Agent

Combining PPO agent with gym_trading_env #1