Toni-SM / skrl

Modular reinforcement learning library (on PyTorch and JAX) with support for NVIDIA Isaac Gym, Omniverse Isaac Gym and Isaac Lab
https://skrl.readthedocs.io/
MIT License
443 stars 43 forks source link

Use single forward pass in shared model architectures #156

Closed lopatovsky closed 2 weeks ago

lopatovsky commented 3 weeks ago

Single forward pass

Motivation:

When applying the shared model, forward pass is called twice, once for policy and once for value. The input values for the forward call are identical, so the output value could be cached to improve performance.

!Note: Single forward pass also influences the autograd graph construction, so the significant speedup happens also during the backward pass phase.

Speed eval:

Library Single forward pass Time (s) slowing factor Base: rlgames, mixed pr. = True slowing factor Base: rlgames, mixed pr. = False
RLGamesmixed pr. = False Yes 141 1.259x 1 (base)
RLGamesmixed pr. = True Yes 112 1 (base) 0.794x
SKRL No 199 1.777x 1.411x
SKRL Yes 151 1.348x 1.071x

* Mixed precision = True

Quality eval:

We trained a policy for our task with each of the configurations multiple times. We didn’t observe any statistically significant difference in quality of the final results.

Notice: The single and double pass runs would be identical in ideal world, but because of finite double precision and different order of computation of gradient, they diverge gradually.  

Note:

- this implementation is minimalistic, but it’s quite dangerous to generalise, as it requires the value forward pass always follow the policy forward pass.\ To make it safer we may implement caching of input and check if the next input is the same\ -  a) check if they are reference to the same object

- b) compare input and cached input tensors directly. It brings some overhead in computation, but it’s negligible compared to time spared.