Why are dones in MarkovVectorEnv.step() transformed to int?

Farama-Foundation / SuperSuit

A collection of wrappers for Gymnasium and PettingZoo environments (being merged into gymnasium.wrappers and pettingzoo.wrappers

Other

446 stars 56 forks source link

In MarkovVectorEnv.step() dones returned by the parallel env are transformed to numpy.uint8 with the following lines:

dns = np.array(
            [dones.get(agent, False) for agent in self.par_env.possible_agents],
            dtype=np.uint8,
        )

Why is this the case?

This behavior might lead to unexpected consequences. For example, if we want to do parameter sharing with StableBaselines3, we wrap the parallel env into a MarkovVectorEnv with pettingzoo_env_to_vec_env_v1() and then use concat_vec_envs_v1() as seen in the tutorial. However, if we want to use batch normalization with StableBaselines3.VecNormalize, we get a silent bug, as the line

self.returns[dones] = 0

in VecNormalize.step_wait() then always sets the returns of env_0 to 0 if any done is False.

Farama-Foundation / SuperSuit

Why are dones in MarkovVectorEnv.step() transformed to int? #175