Lightning-Universe / lightning-bolts

Toolbox of models, callbacks, and datasets for AI/ML researchers.
https://lightning-bolts.readthedocs.io
Apache License 2.0
1.7k stars 323 forks source link

Optimise RL Code #185

Open HenryJia opened 4 years ago

HenryJia commented 4 years ago

🚀 Feature

There seems to be fair few inefficiencies in the RL model code.

In both the VPG and DQN code, the network is computed twice, once to generate the trajectory and then once again in the loss function.

https://github.com/PyTorchLightning/pytorch-lightning-bolts/blob/master/pl_bolts/datamodules/experience_source.py#L165

https://github.com/PyTorchLightning/pytorch-lightning-bolts/blob/master/pl_bolts/models/rl/vanilla_policy_gradient_model.py#L146 https://github.com/PyTorchLightning/pytorch-lightning-bolts/blob/master/pl_bolts/losses/rl.py#L35

Because of the way PyTorch stores computational graph, it is sufficient to simply run the network once when generating the trajectory and store the output, and compute the loss on that at each training step. This is pointlessly doubling the computational cost.

Furthermore, in both the VPG and DQN code, multi-envs are allowed, but no parallelrisation is applied across them. This takes away significant proportion of the advantage of using multi-envs in the first place https://github.com/PyTorchLightning/pytorch-lightning-bolts/blob/master/pl_bolts/models/rl/vanilla_policy_gradient_model.py#L202

Borda commented 4 years ago

@djbyrne mind have look or work together? :]