Closed sidhantls closed 3 years ago
Hi! thanks for your contribution!, great first issue!
@sid-sundrani Thank you for your suggestion! In my opinion, it would be nice to have more models in Bolts. Let's hear what other core contributors say :]
I wrote a rudimentary PPO implementation on lightning bolts. It performs similar to the benchmark PPO on CartPole-v0. It doesn't support GPU. Most importantly, it does not run multiple steps of gradient descent on each batch of data, as OpenAI's implementation does. I wasn't sure quite how to make this work with the trainer.
If anyone is interested in building it up let me know I've linked my fork
@sid-sundrani Thank you for sharing your implementation! I quickly checked your code, and it seems to have no big problem in terms of formatting. If you're still interested in adding PPO to Bolts, I think you can submit a PR as @Borda added this issue to the project as To do in Reinforcement Learning
:]
@PyTorchLightning/bolts-contributors Since I am not familiar with reinforcement learning, could anyone have a look?
Sure, I'm still interested. I'll submit a PR
🚀 Feature
Implementation of PPO RL algorithm
Motivation
As brought up in issue 186 , the RL section of bolts currently only includes variants of DQN and VPG and lacks some of the more modern RL algorithms such as PPO. I want to know if it's of interest in having a PPO implementation in Bolts? I'm interested to discuss and contribute on these lines if there is scope
Pitch
Implementation will be similar to the other policy gradient methods already implemented like Reinforce. Will have to incorporate actor critic model, corresponding methods to calculate surrogate loss, and so on