KeyError when accessing agent info in pettingzoo example

itstyren commented 1 year ago

Hi, thanks for this fantastic project. I'm new to the field of MARL and when attempting to run the provided example, I encountered the following error message. It would be greatly appreciated if someone could assist me with this?

Issue Type

Bug Report

Bug Description

Encountered a KeyError while attempting to access agent information in the pettingzoo.utils.env module, using provided pettingzoo.

Steps to Reproduce

Clone the repository to local.
Set up the environment with pip install dm-meltingpot and pip install -r examples/requirements.txt

Run the following code snippet:

python -m examples.pettingzoo.sb3_train.py

Expected Behavior

Send Meltingpot substrates to PettingZoo's ParallelEnv and train model using PPO algorithm.

Actual Behavior:

Instead, I encountered a KeyError along with the following error message:

  File "/home/tyren/miniconda3/envs/meltingpot/lib/python3.10/site-packages/stable_baselines3/ppo/ppo.py", line 308, in learn
    return super().learn(
  File "/home/tyren/miniconda3/envs/meltingpot/lib/python3.10/site-packages/stable_baselines3/common/on_policy_algorithm.py", line 259, in learn
    continue_training = self.collect_rollouts(self.env, callback, self.rollout_buffer, n_rollout_steps=self.n_steps)
  File "/home/tyren/miniconda3/envs/meltingpot/lib/python3.10/site-packages/stable_baselines3/common/on_policy_algorithm.py", line 178, in collect_rollouts
    new_obs, rewards, dones, infos = env.step(clipped_actions)
  File "/home/tyren/miniconda3/envs/meltingpot/lib/python3.10/site-packages/stable_baselines3/common/vec_env/base_vec_env.py", line 197, in step
    return self.step_wait()
  File "/home/tyren/miniconda3/envs/meltingpot/lib/python3.10/site-packages/stable_baselines3/common/vec_env/vec_transpose.py", line 95, in step_wait
    observations, rewards, dones, infos = self.venv.step_wait()
  File "/home/tyren/miniconda3/envs/meltingpot/lib/python3.10/site-packages/stable_baselines3/common/vec_env/vec_monitor.py", line 76, in step_wait
    obs, rewards, dones, infos = self.venv.step_wait()
  File "/home/tyren/miniconda3/envs/meltingpot/lib/python3.10/site-packages/supersuit/vector/sb3_vector_wrapper.py", line 25, in step_wait
    observations, rewards, terminations, truncations, infos = self.venv.step_wait()
  File "/home/tyren/miniconda3/envs/meltingpot/lib/python3.10/site-packages/supersuit/vector/concat_vec_env.py", line 75, in step_wait
    return self.step(self._saved_actions)
  File "/home/tyren/miniconda3/envs/meltingpot/lib/python3.10/site-packages/supersuit/vector/concat_vec_env.py", line 83, in step
    venv.step(
  File "/home/tyren/miniconda3/envs/meltingpot/lib/python3.10/site-packages/supersuit/vector/markov_vector_wrapper.py", line 69, in step
    observations, rewards, terms, truncs, infos = self.par_env.step(act_dict)
  File "/home/tyren/miniconda3/envs/meltingpot/lib/python3.10/site-packages/supersuit/generic_wrappers/utils/shared_wrapper_util.py", line 130, in step
    observations, rewards, terminations, truncations, infos = super().step(actions)
  File "/home/tyren/miniconda3/envs/meltingpot/lib/python3.10/site-packages/pettingzoo/utils/wrappers/base_parallel.py", line 48, in step
    res = self.env.step(actions)
  File "/home/tyren/miniconda3/envs/meltingpot/lib/python3.10/site-packages/pettingzoo/utils/conversions.py", line 190, in step
    obs, rew, termination, truncation, info = self.aec_env.last()
  File "/home/tyren/miniconda3/envs/meltingpot/lib/python3.10/site-packages/pettingzoo/utils/env.py", line 190, in last
    self.infos[agent],
KeyError: 'player_0'

Environment:

OS: Linux-5.10.16.3-microsoft-standard-WSL2-x86_64-with-glibc2.35
Python version: Python: 3.10.12
Stable-Baselines3: 2.0.0
PyTorch: 2.0.1+cu117
GPU Enabled: True
Pettingzoo: 1.23.1

Additional information

To run this example, I also change the line 35 pettingzoo/utils.py to

# build method expect env name
self._env = substrate.build(
            env_name, roles=self.env_config.default_player_roles
        )

And the env name should be commons_harvest__open instead of commons_harvest_open.

Possible solution

I've discovered that by commenting out line 190 in site-packages/pettingzoo/utils/env.py file, the error doesn't occur. However, I'm hesitant to consider this as the proper solution.

duenez commented 1 year ago

There is currently an incompatibility between the PettingZoo and RLLib examples, as they require different versions of Gym/Gymnasium. This means that the examples are broken and it is hard to test them on our side.

I cannot immediately see why the player_0 wouldn't be present in the infos, other than them being deleted (due to a terminal state) before they are accessed. Unfortunately we cannot easily test this on our side.

For what is worth, we don't use the infos internally, and your fix would be fine in terms of functionality.

itstyren commented 11 months ago

It appears to be an issue with PettingZoo, which has been resolved in this latest update: https://github.com/Farama-Foundation/PettingZoo/pull/1082.

google-deepmind / meltingpot