Toni-SM / skrl

Modular reinforcement learning library (on PyTorch and JAX) with support for NVIDIA Isaac Gym, Omniverse Isaac Gym and Isaac Lab
https://skrl.readthedocs.io/
MIT License
443 stars 43 forks source link

Environment reset function #73

Closed IanWangg closed 1 year ago

IanWangg commented 1 year ago

Hi, I am looking into using skrl+isaacgym as future research tools. Many thanks to the authors for providing such a quality library.

I am a bit confused by the implementation of IssacGymPreview4Wrapper and the Trainers here, the following are the reset function of the wrapper and its usage in the trainer:

def reset(self) -> Tuple[torch.Tensor, Any]:
    """Reset the environment

    :return: Observation, info
    :rtype: torch.Tensor and any other info
    """
    if self._reset_once:
        self._obs_dict = self._env.reset()
        self._reset_once = False
    return self._obs_dict["obs"], {}
# reset environments
with torch.no_grad():
    if terminated.any() or truncated.any():
        states, infos = self.env.reset()
    else:
        states.copy_(next_states)

It seems that, when using multiple environments, one one of them terminates, and all of them will get reset? Or is there some mechanism on the isaac gym side that deal with this case, so that only the terminated ones get reset?

If I am correct (all of them get reset if one of them terminates), why design like this? Not many algorithm can take advantages of multi-environment, but the PPO implementations usually do not do this.

Thank you in advance for any explanation!

Toni-SM commented 1 year ago

Hi @IanWangg

The implementation of multiples environments (parallel environment in the case of Isaac Gym preview, Isaac Orbit and Omniverse Isaac Gym; or vectorized environments using OpenAI Gym or Farama Gymnasium) where num_envs > 1 handle terminating environment restarts internally.

In these cases, it is only necessary to restart all environments, from the outside (i.e. from the skrl's trainer) only once (at the beginning of the training/evaluation).

In fact, the wrappers for those environment types only return the observations (and infos) in subsequent invocations, without resetting the environments. This via the self._reset_once flag.

The implementation of the trainers of the released version (which is based on the basic Gym/Gymnasium API for a single environment) always checks and calls the reset method of the wrappers (when the execution is terminated or truncated), regardless of whether the setup contains multiple environments or not. In the case of multiple environments, it does not produce an effective reset, at least in subsequent calls.

Then, in the case of multiple environments, such a practice is not necessary and causes additional computational overhead. Therefore, the trainer implementation in upcoming versions of skrl will handle this differently, as implemented in the unreleased multi-agent branch.

https://github.com/Toni-SM/skrl/blob/f6c7d717e2ad3d671f8638dee4c80483510541ec/skrl/trainers/torch/base.py#L201-L209

IanWangg commented 1 year ago

So, in single agent environment, among all parallel environments in Isaac family, call reset() from outside will only reset environments that are terminated or truncated, correct? And for Gym/Gymnasium parallel environments, call reset() from outside will actually reset all environments regardless whether they are terminated/truncated or not?

Toni-SM commented 1 year ago

For both cases, NVIDIA Isaac family and vectorized gym/gymnasium environments, where the num_envs > 1, calling the original environment .reset() method in subsequent timesteps (except at the begging of the training/evaluation) is an error because these interfaces autoreset (internally) sub-environments after they terminate or truncated.

That is why calling the wrapped environment .reset() method from skrl trainers (from outside) does not invoke the original environment method (in subsequent timesteps) and only returns the environment next states (and infos).

For example, the OpenAI Gym environment wrapper handle the vectorized environment as follow (similar to how the Isaac Gym wrapper does it, as you showed in your first post)

https://github.com/Toni-SM/skrl/blob/6b8b70fc2f5fd13087c70a9b51f0ef630c638bcc/skrl/envs/torch/wrappers.py#L460-L464

IanWangg commented 1 year ago

Thank you for your explanation!