Open HenryJia opened 4 years ago
Hi! thanks for your contribution!, great first issue!
@HenryJia mind send a PR? :]
That is true. Also, the implementation of VPG in vanilla_policy_gradient_model.py collects and trains the agent on a trajectory only from one episode each time, whereas reinforce trains on trajectories from multiple episodes
@sid-sundrani @HenryJia anyone wants to take it over and send a PR?
Sure. Are we looking to remove one of these implementations? Is so which one are we keeping?
🐛 Bug
It seems like there's 2 near identical implementations of VPG, with only a small change in loss function https://github.com/PyTorchLightning/pytorch-lightning-bolts/blob/master/pl_bolts/models/rl/reinforce_model.py https://github.com/PyTorchLightning/pytorch-lightning-bolts/blob/master/pl_bolts/models/rl/vanilla_policy_gradient_model.py