Farama-Foundation / Gymnasium

An API standard for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly Gym)
https://gymnasium.farama.org
MIT License
7.36k stars 822 forks source link

[Bug Report] shallow copy of info and observation after termination #1132

Open VictorTao1998 opened 3 months ago

VictorTao1998 commented 3 months ago

Describe the bug

when using async_vector_env, the info after termination of one episode is wrong, it returns the info from reset() instead of step()

Code example

in gymnasium/vector/async_vector_env.py 642:646
if terminated or truncated:
                    old_observation, old_info = observation, info
                    observation, info = env.reset()
                    info["final_observation"] = old_observation
                    info["final_info"] = old_info

I think there should be a deepcopy for the old_observation and old_info here.

System info

gymnasium==0.29.1

Additional context

No response

Checklist

pseudo-rnd-thoughts commented 3 months ago

Thanks for the issue but your message and title seem to be about different things.

  1. For when reset occurs (autoreset), for gym < 1, this is intentional and we are changing this in v1.0
  2. For no deepcopy of obs, info, could you provide an example where this is an issue?