Closed vmoens closed 10 months ago
Sorry I'm a bit confused at the request.
A change was made a version ago such that an environment's spec was only equal to the environment stack's equivalent spec allowing you to recreate a particular environment stack. See https://github.com/Farama-Foundation/Gymnasium/pull/292
However, this resulted in env.unwrapped.spec
having several parameters "disabled", e.g. max_episode_steps
, as the unwrapped means the TimeLimit
wrapper is not applied currently.
My understanding of make
's implementation is that max_episode_steps
only refers to TimeLimit
and cannot even be passed to the environment as a kwargs
, only with register(..., kwargs={"max_episode_steps": X})
If I understand, can you not access this information with env.spec.max_episode_steps
removing the unwrapped
Let me explain:
env.unwrapped.spec
would reflect exactly the specs we had set when calling register
(see example above), hence the env had access to its horizonNow the spec being written in the env do not match the specs passed during registration. That seems odd to me. I think it's a valid usage for people to be able to gather the specs they passed during registration once the env is created.
This is true, just for env
not env.unwrapped
, let me know if you find a case where gym.make("X").spec != gym.spec("X")
Hmm, at least in robohive our env.unwrapped.max_episode_steps
* are always None
when we use gymnasium as backend, not with gym! This is because it is overwritten here
@vikashplus can testify
*this snippet uses env.spec
but we get the same result with env.unwrapped
I'm dealing with the same issue. Here is a small example
import gym
ee = gym.make("Pendulum-v0")
print(ee.spec.max_episode_steps, ee.unwrapped.spec.max_episode_steps) # Prints 200, 200
import gymnasium as gym
ee = gym.make("Pendulum-v1")
print(ee.spec.max_episode_steps, ee.unwrapped.spec.max_episode_steps) # Prints 200, None
The situation gets worse when an env has multiple wrappers. The env fails to access its own horizon via self.spec.max_episode_steps
. All of the followings (ee.env.spec.max_episode_steps, ee.env.env.env.spec.max_episode_steps, ee.env.env.env.unwrapped.spec.max_episode_steps) ends up returning None
Sorry, I'm a bit confused by what the bug is, as this is the intended behaviour for Gymnasium
Is it that you are trying to access the max_episode_steps
from within the environment?
Otherwise, if you have access to the environment stack, then you should just need to change env.unwrapped.max_episode_steps
to env.max_episode_step
and this works for Gym and Gymnasium
I guess the question is: if it's in the spec during registration, why isn't it in the unwrapped env.spec?
I get why it can't be found if it's passed during a call to gym.make
, but if you pass it during register
to me you're saying that this is an intrinsic attribute of your env, and hence something that you may want to access at runtime.
We've found a convoluted way of recovering that info but I think the semantic here isn't super clear for the users (especially given that it used to work and was pretty convenient, and many of us would think this feature made sense)
+1 to all @vmoens points.
I also think max_episode_steps
(amongst other passed specs) should be accessible from within the environment.
We added to v0.29 the ability to recreate an instance of an environment (for the whole environment stack using spec).
Therefore, we were required to change the implementation to spec as to recreate unwrapped env is completely different to env.
Read #292 for the technical details
Unfortunate!
I'd rather close this as non-planned than completed then, as it is an issue for us. I reitarate that IMO env.unwrapped.spec
should match the specs passed during registration but I guess it's not open do discussion.
I reitarate that IMO env.unwrapped.spec should match the specs passed during registration but I guess it's not open do discussion.
Is there a problem with using env.spec
rather env.unwrapped.spec
?
I have been dealing with the same bug since past week with different envs
@Jdvakil: Sorry to hear that. We are struggling at our end too. I'll send you a ping if we figure out a workaround.
@pseudo-rnd-thoughts -- see my comment here. I have provided a detailed example. The issue is in env.unwrapped.spec
. @vmoens also pointed at the bug and what a fix might look like. Do you see any gap is this solution?
copy pasting a part of it here
import gymnasium as gym
ee = gym.make("Pendulum-v1")
print(ee.spec.max_episode_steps, ee.unwrapped.spec.max_episode_steps)
# Prints 200, None. (It should be 200,200)
Is there a problem with using env.spec rather env.unwrapped.spec?
Yes, in that we don't want to use any wrapper. The envs are terminating by themselves. On top of this the unwrapped env must know its horizon, it doesn't and shouldn't know what its wrapper is.
@vikashplus As I explained above, we don't view this as a bug, this is a feature change where the definition of spec
has changed from the registry
data, i.e., gym.spec(env_id)
to the current environment specification for an environment such that for a given environment the whole environment can be recreated accurately. This is a critical feature for offline RL and the reproducibility of environments.
@vmoens That makes sense, do you mean terminating or truncating with the time limit. If terminating, then it would seem improper to include a TimeLimit
wrapper anyway so not using max_episode_limit
and rather passing a variable to the environment would be the proper approach.
I understand this would require modifying historical code for gymnasium but looking by at #292 I don't see way of us adding the features we added and keeping max_episode_limit
as you would like
@pseudo-rnd-thoughts In that case what is the recommended way for an env to access its own horizon (previously max_episode_steps) from within the environment?
@pseudo-rnd-thoughts In that case what is the recommended way for an env to access its own horizon (previously max_episode_steps) from within the environment?
Good question, I would recommend internalise the variable such you use an environment parameter.
While I believe it can be called max_episode_steps, I don't believe you can modify the argument from make and you will need to use register kwargs to specify the value
Describe the bug
In previous versions of gym, an env registered with
max_episode_steps=N
could see itsenv.spec.max_episode_steps
refelect this value. Now this attribute is automatically set to None even if the env is explicitely registered with thisWould it make sense to keep the value from the registration in the env spec, or set it to None only if
max_episode_steps
is passed whenmake
is called, ieOtherwise, it's hard for us to know what the env horizon is (we don't need a TimeLimit, the env is terminated at
max_episode_steps
regardless of that)Happy to make a PR to solve this issue
cc @vikashplus
Code example
No response
System info
No response
Additional context
No response
Checklist