Farama-Foundation / SuperSuit

A collection of wrappers for Gymnasium and PettingZoo environments (being merged into gymnasium.wrappers and pettingzoo.wrappers
Other
441 stars 56 forks source link

[Bug] Final Observation is not returned when using pettingzoo_env_to_vec_env_v1/MarkovVectorEnv #232

Closed KaleabTessera closed 8 months ago

KaleabTessera commented 9 months ago

Description

When using pettingzoo_env_to_vec_env_v1, at the end of an episode, infos[agent]["terminal_observation"] should be returned as this per this line. This doesn't happen, since infos get overwritten here.

This appears to be an issue, because otherwise I don't think you can get the final observation. E.g. say you are in the final step and call this step, then the final observation gets overwritten here and you never get the final obs.

Reproduce

import supersuit as ss
from pettingzoo.mpe import simple_spread_v3
env = simple_spread_v3.parallel_env()
possible_agents = env.possible_agents
possible_agents = env.possible_agents
env = ss.pettingzoo_env_to_vec_env_v1(env)

observations, infos = env.reset()
terms = [False]
truncs = [False]
env_done = False
while not env_done:
    # this is where you would insert your policy
    actions_dict = {agent: int(env.action_space.sample()) for agent in possible_agents}
    actions = actions_dict.values()

    observations, rewards, terms, truncs, infos = env.step(actions)
    env_done = (terms | truncs).all()
    print(observations, env_done)
env.close()

The final obs printed here is from a new env reset setting.