UM-ARM-Lab / pytorch_kinematics

Robot kinematics implemented in pytorch
MIT License
367 stars 33 forks source link

How to compute the gradient of reward w.r.t. model parameters? #8

Closed c4cld closed 9 months ago

c4cld commented 2 years ago

My research interest is robustness of RL algorithms to environment parameters. I want to modify currents RL algorithms to make them achieve good performance when they are tested in environments with unfamilar parameters. (For example, an agent is trained in Cartpole environment with 1m pole. I want it achieve good performance in Cartpole environment with 3m pole.) To achieve this goal, I want to get the relationship between model parameter values and RL algorithm's performance (reward). As a result, I want to get the gradient of reward with respect to the model parameters. Mujoco simulator has applied in my experiments. But Mujoco simulator is not implemented by pure python. So I cannot get the the gradient of reward with respect to the model parameters. So my queation is: Can pytorch_kinematics compute the gradient of reward with respect to the model parameters? If so, how should I use it to achieve this goal?

LemonPi commented 1 year ago

I'm not sure because:

If you're using RL to learn a control policy, the first case (that PK is not a simulator) may or may not work for you depending on if your environment and problem is quasi-static. If it is quasi-static, then you could maybe use PK as the mapping from what your RL-learned model outputs and what you need to compute a rewards function.

The second point depends on if your reward depends on the output of kinematics. For example, distance of a transformed link to some goal would be a reward/cost function that you could differentiate through with PK.

Specific to your cartpole environment, I think it's doable. If you formulate the pole length as a prismatic joint, then you could set requires_grad=True on the pole length joint value. Use it in forward kinematics, compute the reward based on the output transform (e.g. distance of end of cartpole link to a goal set)

PeterMitrano commented 9 months ago

Closing since there hasn't been any follow up.