Open dzako opened 1 year ago
Hi @dzako, thank for your kind words and appreciation. You are right, for now obs and state are wrapped with a stop gradient operation. While I agree that this is a desirable feature for certain environments there are two main considerations:
I will see if it makes sense to add a stop_gradient
option when calling gymnax.make
. Let me know if you have ideas/opinions and what your particular use case could be.
I think it makes sense to remove all stop_gradient
s from the environments themselves, so that RL algorithms downstream have the option to use those gradients if desired.
It seems to me like it is the downstream responsibility of an RL algorithm to impose a stop_gradient
if they happen to require it.
I just wanted to bump this issue, because I think it would be very useful to have the ability to differentiate through dynamics and observation function. This would allow us to use gymnax
for the purpose of model-based control and for explicit modeling of partially observable environments.
+1 Yeah. This would be really nice feature. Does anyone know a library that offers a differentiable step function?
@janakact
Does anyone know a library that offers a differentiable step function?
Shameless self-plug: I have a package for non-linear inverse optimal control that makes use of differentiable step functions. However, the environments are custom partially-observable stochastic environments and therfore do not completely correspond to standard environments from gym.
Hello, is it possible to return the differential of the step reward function (with respect to the action) at least for the simplest envs like pendulum, cartple ? Best, Jacek