Toni-SM / skrl

Modular reinforcement learning library (on PyTorch and JAX) with support for NVIDIA Isaac Gym, Omniverse Isaac Gym and Isaac Lab
https://skrl.readthedocs.io/
MIT License
445 stars 43 forks source link

Support gym's new step API #25

Closed juhannc closed 1 year ago

juhannc commented 1 year ago

gyms version 0.25.0 updates the step API. step is now supposed to return terminated: bool and truncated: bool instead of is_done: bool.

A quick and dirty fix would be the following. However, the release notes of gym (and a promised blog post to be released) mention, that done is not equal to termination. As I'm not yet sure, how much of an impact that would be for skrl, I'm opening this issue.

PS: @Toni-SM, I tried to open a new card in the project, but I don't have the rights to do so. I'd be honored if you would consider adding me to the project! :)

diff --git a/skrl/envs/torch/wrappers.py b/skrl/envs/torch/wrappers.py
index 62110e0..ffd6489 100644
--- a/skrl/envs/torch/wrappers.py
+++ b/skrl/envs/torch/wrappers.py
@@ -271,6 +271,11 @@ class GymWrapper(Wrapper):
         except Exception as e:
             print("[WARNING] Failed to check for a vectorized environment: {}".format(e))

+        if hasattr(self, "new_step_api"):
+            self._new_step_api = self._env.new_step_api
+        else:
+            self._new_step_api = False
+
     @property
     def state_space(self) -> gym.Space:
         """State space
@@ -359,13 +364,16 @@ class GymWrapper(Wrapper):
         :return: The state, the reward, the done flag, and the info
         :rtype: tuple of torch.Tensor and any other info
         """
-        observation, reward, done, info = self._env.step(self._tensor_to_action(actions))
+        if self._new_step_api:
+            observation, reward, done, _, info = self._env.step(self._tensor_to_action(actions))
+        else:
+            observation, reward, done, info = self._env.step(self._tensor_to_action(actions))
         # convert response to torch
         return self._observation_to_tensor(observation), \
                torch.tensor(reward, device=self.device, dtype=torch.float32).view(self.num_envs, -1), \
                torch.tensor(done, device=self.device, dtype=torch.bool).view(self.num_envs, -1), \
                info

     def reset(self) -> torch.Tensor:
         """Reset the environment
Toni-SM commented 1 year ago

Hi @JohannLange

Would you like to open a pull request with this change?

Looking at those variables, it wouldn't be wrong to think of done as terminated OR truncated, taking into account single boolean values or list/ndarray... Hopefully, the promised blog post will provide relevant details about the new API soon

Btw, now you should have access to the https://github.com/users/Toni-SM/projects/2/views/8 project...

juhannc commented 1 year ago

Looking at those variables, it wouldn't be wrong to think of done as terminated OR truncated, taking into account single boolean values or list/ndarray... Hopefully, the promised blog post will provide relevant details about the new API soon

Yeah makes more sense than my first idea!

Btw, now you should have access to the github.com/users/Toni-SM/projects/2/views/8 project...

Awesome, thank you!