Farama-Foundation / Gymnasium

An API standard for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly Gym)
https://gymnasium.farama.org
MIT License
7.32k stars 817 forks source link

[Bug Report] max_episode_steps is not passed to the env's spec attribute anymore #871

Closed vmoens closed 10 months ago

vmoens commented 10 months ago

Describe the bug

In previous versions of gym, an env registered with max_episode_steps=N could see its env.spec.max_episode_steps refelect this value. Now this attribute is automatically set to None even if the env is explicitely registered with this

Would it make sense to keep the value from the registration in the env spec, or set it to None only if max_episode_steps is passed when make is called, ie

# max_episode_steps is proper to the env
register(envname0, max_episode_steps=N)
make(envname0)  # env.unwrapped.spec.max_episode_steps == N
# max_episode_steps is just there to tell us to wrap it in a TimeLimit
register(envname1, max_episode_steps=None)
make(envname0, max_episode_steps=None)  # env.unwrapped.spec.max_episode_steps == None

Otherwise, it's hard for us to know what the env horizon is (we don't need a TimeLimit, the env is terminated at max_episode_steps regardless of that)

Happy to make a PR to solve this issue

cc @vikashplus

Code example

No response

System info

No response

Additional context

No response

Checklist

pseudo-rnd-thoughts commented 10 months ago

Sorry I'm a bit confused at the request.

A change was made a version ago such that an environment's spec was only equal to the environment stack's equivalent spec allowing you to recreate a particular environment stack. See https://github.com/Farama-Foundation/Gymnasium/pull/292 However, this resulted in env.unwrapped.spec having several parameters "disabled", e.g. max_episode_steps, as the unwrapped means the TimeLimit wrapper is not applied currently.

My understanding of make's implementation is that max_episode_steps only refers to TimeLimit and cannot even be passed to the environment as a kwargs, only with register(..., kwargs={"max_episode_steps": X})

If I understand, can you not access this information with env.spec.max_episode_steps removing the unwrapped

vmoens commented 10 months ago

Let me explain:

pseudo-rnd-thoughts commented 10 months ago

Now the spec being written in the env do not match the specs passed during registration. That seems odd to me. I think it's a valid usage for people to be able to gather the specs they passed during registration once the env is created.

This is true, just for env not env.unwrapped, let me know if you find a case where gym.make("X").spec != gym.spec("X")

vmoens commented 10 months ago

Hmm, at least in robohive our env.unwrapped.max_episode_steps* are always None when we use gymnasium as backend, not with gym! This is because it is overwritten here

@vikashplus can testify

*this snippet uses env.spec but we get the same result with env.unwrapped

vikashplus commented 10 months ago

I'm dealing with the same issue. Here is a small example

import gym
ee = gym.make("Pendulum-v0")
print(ee.spec.max_episode_steps, ee.unwrapped.spec.max_episode_steps) # Prints 200, 200
import gymnasium as gym
ee = gym.make("Pendulum-v1")
print(ee.spec.max_episode_steps, ee.unwrapped.spec.max_episode_steps) # Prints 200, None

The situation gets worse when an env has multiple wrappers. The env fails to access its own horizon via self.spec.max_episode_steps. All of the followings (ee.env.spec.max_episode_steps, ee.env.env.env.spec.max_episode_steps, ee.env.env.env.unwrapped.spec.max_episode_steps) ends up returning None

pseudo-rnd-thoughts commented 10 months ago

Sorry, I'm a bit confused by what the bug is, as this is the intended behaviour for Gymnasium

Is it that you are trying to access the max_episode_steps from within the environment? Otherwise, if you have access to the environment stack, then you should just need to change env.unwrapped.max_episode_steps to env.max_episode_step and this works for Gym and Gymnasium

vmoens commented 10 months ago

I guess the question is: if it's in the spec during registration, why isn't it in the unwrapped env.spec? I get why it can't be found if it's passed during a call to gym.make, but if you pass it during register to me you're saying that this is an intrinsic attribute of your env, and hence something that you may want to access at runtime. We've found a convoluted way of recovering that info but I think the semantic here isn't super clear for the users (especially given that it used to work and was pretty convenient, and many of us would think this feature made sense)

vikashplus commented 10 months ago

+1 to all @vmoens points. I also think max_episode_steps (amongst other passed specs) should be accessible from within the environment.

pseudo-rnd-thoughts commented 10 months ago

We added to v0.29 the ability to recreate an instance of an environment (for the whole environment stack using spec).

Therefore, we were required to change the implementation to spec as to recreate unwrapped env is completely different to env.

Read #292 for the technical details

vmoens commented 10 months ago

Unfortunate!

I'd rather close this as non-planned than completed then, as it is an issue for us. I reitarate that IMO env.unwrapped.spec should match the specs passed during registration but I guess it's not open do discussion.

pseudo-rnd-thoughts commented 10 months ago

I reitarate that IMO env.unwrapped.spec should match the specs passed during registration but I guess it's not open do discussion.

Is there a problem with using env.spec rather env.unwrapped.spec?

Jdvakil commented 9 months ago

I have been dealing with the same bug since past week with different envs

vikashplus commented 9 months ago

@Jdvakil: Sorry to hear that. We are struggling at our end too. I'll send you a ping if we figure out a workaround.

@pseudo-rnd-thoughts -- see my comment here. I have provided a detailed example. The issue is in env.unwrapped.spec. @vmoens also pointed at the bug and what a fix might look like. Do you see any gap is this solution?

copy pasting a part of it here

import gymnasium as gym
ee = gym.make("Pendulum-v1")
print(ee.spec.max_episode_steps, ee.unwrapped.spec.max_episode_steps) 
# Prints 200, None. (It should be 200,200)
vmoens commented 9 months ago

Is there a problem with using env.spec rather env.unwrapped.spec?

Yes, in that we don't want to use any wrapper. The envs are terminating by themselves. On top of this the unwrapped env must know its horizon, it doesn't and shouldn't know what its wrapper is.

pseudo-rnd-thoughts commented 9 months ago

@vikashplus As I explained above, we don't view this as a bug, this is a feature change where the definition of spec has changed from the registry data, i.e., gym.spec(env_id) to the current environment specification for an environment such that for a given environment the whole environment can be recreated accurately. This is a critical feature for offline RL and the reproducibility of environments.

@vmoens That makes sense, do you mean terminating or truncating with the time limit. If terminating, then it would seem improper to include a TimeLimit wrapper anyway so not using max_episode_limit and rather passing a variable to the environment would be the proper approach. I understand this would require modifying historical code for gymnasium but looking by at #292 I don't see way of us adding the features we added and keeping max_episode_limit as you would like

vikashplus commented 9 months ago

@pseudo-rnd-thoughts In that case what is the recommended way for an env to access its own horizon (previously max_episode_steps) from within the environment?

pseudo-rnd-thoughts commented 9 months ago

@pseudo-rnd-thoughts In that case what is the recommended way for an env to access its own horizon (previously max_episode_steps) from within the environment?

Good question, I would recommend internalise the variable such you use an environment parameter.

While I believe it can be called max_episode_steps, I don't believe you can modify the argument from make and you will need to use register kwargs to specify the value