ClementPerroud / Rainbow-Agent

Replicate of Reinforcement Learning Rainbow with Tensorflow 2 from paper "Rainbow: Combining Improvements in Deep Reinforcement Learning"
MIT License
8 stars 6 forks source link

Combining PPO agent with gym_trading_env #1

Open martin0 opened 6 months ago

martin0 commented 6 months ago

Hi,

I have been studying reinforcement learning a little bit. I was aiming to combine Proximal Policy Optimization sample from https://github.com/philtabor/Youtube-Code-Repository/tree/master/ReinforcementLearning/PolicyGradient/PPO/tf2 with your gym environment trading_gym_env.
I'm new to gym and challenged by the differences between the different versions. I'm curious to know if you think the PPO algorithm has potential. Kind regards Martin

MickyDowns commented 5 months ago

I tested the vanilla Gym-Trading environment (i.e., not vector, not multi-db) with actions = [-1, 0, 1], window = 10, training steps = [25k, 50k, 75k, 100k] on 5 synthetic datasets (11k intervals, pos/neg slope + sine perturbations (2x) + various levels of white noise, 90/10 train/validation split) using off-the-shelf StableBaselines3 configurations for A2C, ARS, DQN, PPO, QR-DQN, RecurrentPPO and TRPO. Best performance came from QR-DQN, A2C and DQN. PPO was middle of the pack. So, while not an answer to your question as no tests were optimized, directionally interesting to see simpler A2C beat PPO and RecurrentPPO (with its LSTM policy) perform near bottom.