Farama-Foundation / Gymnasium

An API standard for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly Gym)
https://gymnasium.farama.org
MIT License
7.13k stars 792 forks source link

[Question] is `Env.reset(options)` being used? #869

Closed Kallinteris-Andreas closed 9 months ago

Kallinteris-Andreas commented 9 months ago

Question

Env.reset has the options argument, which from what I can tell is not being used, instead the "reset options" are passed to Env.__init__

Should the options argument be removed?, in what situation would it make sense? Its recording would indicate that during the training or evaluation process, the distribution of initial states should change, is that something we want?

pseudo-rnd-thoughts commented 9 months ago

Ironically, reset(..., options) was one of the changes made in v0.21 I believe, just before I joined the project. I believe the reason for the change is that you might want to change parameters before resetting the environment each time and clearer than having to modify the environment itself.

That being said, one of the Gym environment actually use options currently would could probably be changed if we want

@RedTachyon I believe you helped make this change, do you have anything to add?

RedTachyon commented 9 months ago

Disclaimer: I was the main person who proposed/spearheaded adding this feature, so I'm obviously biased.

As an anecdote, I used it in my PhD research literally all the time. The idea is that you can make some changes to the environment without having to recreate the entire environment. Consider e.g. an environment that internally uses a Unity executable. Terminating the process and then launching it again will take several seconds, which would be very problematic in case of e.g. meta-learning/generalization studies. In other cases, it might not affect performance, but instead simplify code, e.g. if you want to do a curriculum on CartPole with a progressively widening initialization range.

Current built-in environments don't have that many options currently supported, but maybe we should change it by adding more flexibility in that direction. Still, built-in environments are like 5% of the utility of Gymnasium, no ground-breaking research happens on CartPole and LunarLander. The point is having a shared API for various types of research, and this element of the API is completely ignorable for those who don't care, but instead it enables/simplifies a whole different class of research within the shared API.

I can't with any certainty say how commonly it's used by other people, I might have some more anecdotes, but that's not really useful.

Is there any reason why we should remove this feature? For the record, I'm strongly opposed to doing that, but maybe there's something I didn't consider, and I don't really see any concrete arguments in this thread.

Kallinteris-Andreas commented 9 months ago

I suppose it is the most natural way, to change initialization parameters throughout training.