Farama-Foundation / Gymnasium

An API standard for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly Gym)
https://gymnasium.farama.org
MIT License
7.49k stars 841 forks source link

Rework rendering API to allow for simultaneous "human" and "rgb_array" mode rendering #1010

Open TimSchneider42 opened 7 months ago

TimSchneider42 commented 7 months ago

Proposal

I propose to change the gym environment API so that human and rgb_array rendering can be enabled/disabled independently. Hence, instead of having to choose just between no rendering, "human" rendering, and "rgb_rendering", there should be another option to enable both.

Motivation

One thing that has always bothered me about gym is that there is no way of rendering environments in "rgb_array" mode, while also displaying their "human" visualization. However, there are valid use-cases for doing this, e.g., if the policy expects a rendered image as input, but one also wants to see what is currently happening in the scene.

Additionally, when using a simulator like PyBullet or IsaacSim in gym environments, the simulator already provides an interface, which simply has to be enabled at the start. The interfaces these simulators provide are interactive and are thus much more informative for the user than the static "rgb_array" outputs. However, currently, one has to choose between seeing the interfaces and rendering numpy images.

In my opinion, there is no reason why the rendering modes should be mutually exclusive.

Pitch

This could be implemented by separating the "human" rendering mode from the rendering_mode variable. Hence, one could add a boolean option, human_rendering, which enables or disables "human" rendering mode. If enabled, human rendering is always performed in the step function (I think this is already the case).

The render function specification changes so that it is no longer allowed to return None. Instead, it should throw an exception if it is called without any rendering mode active and else return the rendered image in the specified format.

Alternatives

A workaround is to write a wrapper that takes "rgb_array" images generated by gym.Env.render and displays them as well as returning them in its own render function. For custom environments using simulators like PyBullet and IsaacSim, a custom option can be implemented to display their interfaces even if the rendering mode is not set to human.

Additional context

No response

Checklist

pseudo-rnd-thoughts commented 7 months ago

While, it is not standard, I don't see why env.render() with human can't return the rgb array if available rather than None currently. Would this solve the issue your discussing?

Kallinteris-Andreas commented 7 months ago

An alternate solution would be to to allow multiple render modes at the same time Example: render_mode = ["human". "rgb_array", "rgb_array"] This would also enable multi-cameras.

Note: this has been previously suggested inhttps://github.com/openai/gym/issues/3038

TimSchneider42 commented 7 months ago

While, it is not standard, I don't see why env.render() with human can't return the rgb array if available rather than None currently. Would this solve the issue your discussing?

I see three issues with the way rendering is currently implemented:

  1. For existing environments (e.g. cartpole), there is no way of rendering both rgb_array and showing the human interface anymore. In the old API, this was possible via calling Env.render twice with different arguments, as pointed out in the replies to (https://github.com/https://github.com/openai/gym/issues/3038).
  2. Once rendering_mode is set to "human", it is not possible to specify what env.render() should return anymore (“rgb_array”, “rgb_array_list”, “ansi”, “ansi_list”).
  3. Some environments might need to know in advance whether or not they should create "rgb_array" renderings (e.g. for creating buffers or rendering pipelines). If we choose to let Env.render() always return rgb_array renderings, we also need to prepare for rendering every time, if it is needed or not.

An alternate solution would be to to allow multiple render modes at the same time Example: render_mode = ["human". "rgb_array", "rgb_array"] This would also enable multi-cameras.

I feel like this would overcomplicate things, as certain combinations don't really make sense (e.g., combining "rgb_array" and "rgb_array_list". Also, the render functions would have to be rewritten so that they return multiple different renderings, e.g., "ansi" and "rgb_array". All that I would like to have is being able to enable human rendering next to other rendering modes.

RogerJL commented 6 months ago

There is a point in letting the HumanRender be transparent, as it actually is not possible to observe/verify what is rendered. The other modes could be streamed to file or back into ML-code, but "human" can't...