Add Proximal Policy Optimization

Lightning-Universe / lightning-bolts

Toolbox of models, callbacks, and datasets for AI/ML researchers.

https://lightning-bolts.readthedocs.io

Apache License 2.0

1.7k stars 323 forks source link

Add Proximal Policy Optimization #368

Closed sidhantls closed 3 years ago

sidhantls commented 4 years ago

🚀 Feature

Implementation of PPO RL algorithm

Motivation

As brought up in issue 186 , the RL section of bolts currently only includes variants of DQN and VPG and lacks some of the more modern RL algorithms such as PPO. I want to know if it's of interest in having a PPO implementation in Bolts? I'm interested to discuss and contribute on these lines if there is scope

Pitch

Implementation will be similar to the other policy gradient methods already implemented like Reinforce. Will have to incorporate actor critic model, corresponding methods to calculate surrogate loss, and so on

github-actions[bot] commented 4 years ago

Hi! thanks for your contribution!, great first issue!

akihironitta commented 4 years ago

@sid-sundrani Thank you for your suggestion! In my opinion, it would be nice to have more models in Bolts. Let's hear what other core contributors say :]

sidhantls commented 4 years ago

I wrote a rudimentary PPO implementation on lightning bolts. It performs similar to the benchmark PPO on CartPole-v0. It doesn't support GPU. Most importantly, it does not run multiple steps of gradient descent on each batch of data, as OpenAI's implementation does. I wasn't sure quite how to make this work with the trainer.

If anyone is interested in building it up let me know I've linked my fork

akihironitta commented 3 years ago

@sid-sundrani Thank you for sharing your implementation! I quickly checked your code, and it seems to have no big problem in terms of formatting. If you're still interested in adding PPO to Bolts, I think you can submit a PR as @Borda added this issue to the project as To do in Reinforcement Learning :]

@PyTorchLightning/bolts-contributors Since I am not familiar with reinforcement learning, could anyone have a look?

sidhantls commented 3 years ago

Sure, I'm still interested. I'll submit a PR