Code duplication in vanilla policy gradients

Lightning-Universe / lightning-bolts

Toolbox of models, callbacks, and datasets for AI/ML researchers.

https://lightning-bolts.readthedocs.io

Apache License 2.0

1.68k stars 320 forks source link

Code duplication in vanilla policy gradients #182

Open HenryJia opened 4 years ago

HenryJia commented 4 years ago

🐛 Bug

It seems like there's 2 near identical implementations of VPG, with only a small change in loss function https://github.com/PyTorchLightning/pytorch-lightning-bolts/blob/master/pl_bolts/models/rl/reinforce_model.py https://github.com/PyTorchLightning/pytorch-lightning-bolts/blob/master/pl_bolts/models/rl/vanilla_policy_gradient_model.py

github-actions[bot] commented 4 years ago

Hi! thanks for your contribution!, great first issue!

Borda commented 3 years ago

@HenryJia mind send a PR? :]

sidhantls commented 3 years ago

That is true. Also, the implementation of VPG in vanilla_policy_gradient_model.py collects and trains the agent on a trajectory only from one episode each time, whereas reinforce trains on trajectories from multiple episodes

Borda commented 3 years ago

@sid-sundrani @HenryJia anyone wants to take it over and send a PR?

sidhantls commented 3 years ago

Sure. Are we looking to remove one of these implementations? Is so which one are we keeping?