`MarkovVectorEnv` casts infos as a Python list throws error while training CleanRL's multi-agent PPO code

I am running the CleanRL's PPO code for a custom PettingZoo environment using the code here. In line 163, we wrap the environments with the RecordEpisodeStatistics Gymnasium wrapper, which is then used in lines 210-215 for logging each player's return after the episode has ended.

It turns out that when we invoke pettingzoo_env_to_vec_env_v1, it invokes the MarkovVectorEnv class. Here, in line 59 and also in lines 92 and 101, the infos are cast as a list instead of a usual dict.

Consequently, the aforementioned Gymnasium wrapper throws an error (tested on PZ's Pistonball environment):

----> 6     observations, rewards, terminations, truncations, infos = env.step(actions)
      7 env.close()

File /home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/gymnasium/wrappers/record_episode_statistics.py:95, in RecordEpisodeStatistics.step(self, action)
     87 """Steps through the environment, recording the episode statistics."""
     88 (
     89     observations,
     90     rewards,
   (...)
     93     infos,
     94 ) = self.env.step(action)
---> 95 assert isinstance(
     96     infos, dict
     97 ), f"`info` dtype is {type(infos)} while supported dtype is `dict`. This may be due to usage of other wrappers in the wrong order."
     98 self.episode_returns += rewards
     99 self.episode_lengths += 1

AssertionError: `info` dtype is <class 'list'> while supported dtype is `dict`. This may be due to usage of other wrappers in the wrong order.

Can this please be fixed? If it matters, I am running the code on Lightning Studio with Python 3.10.

Farama-Foundation / SuperSuit

`MarkovVectorEnv` casts infos as a Python list throws error while training CleanRL's multi-agent PPO code #249