NVlabs / DiffRL

[ICLR 2022] Accelerated Policy Learning with Parallel Differentiable Simulation
https://short-horizon-actor-critic.github.io/
Other
263 stars 43 forks source link

Torch deterministic #17

Open HaoxiangYou opened 3 weeks ago

HaoxiangYou commented 3 weeks ago

Thank you for providing this awesome repo!

I try to make results consistent between different runs via seeding(seed, torch_deterministic=True) .

It is known torch has some broadcasting issue with deterministic algorithm: https://github.com/pytorch/pytorch/issues/79987

So, I manually fix the broadcasting in each environment. e.g. In the envs/ant Line 204-206, I change the code to self.state.joint_q.view(self.num_envs, -1)[env_ids, 3:7] = self.start_rotation.clone().unsqueeze(0).expand(len(env_ids), -1) self.state.joint_q.view(self.num_envs, -1)[env_ids, 7:] = self.start_joint_q.clone().unsqueeze(0).expand(len(env_ids), -1) self.state.joint_qd.view(self.num_envs, -1)[env_ids, :] = torch.zeros(size=(len(env_ids), self.num_joint_qd), device = self.device)

After these changes, I run the experiments with/without torch_deterministic=True, e.g. below is the ant test where the blue one is without torch_deterministic=True and orange one with torch_deterministic=True

image

The non-deterministic run is similar to the paper results, however, for the deterministic setting, the rewards remain unchanged.

Does someone have ideas about what else the issue may torch_deterministic=True bring? Thank you very much!