Farama-Foundation / Gymnasium

An API standard for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly Gym)
https://gymnasium.farama.org
MIT License
7.14k stars 793 forks source link

[Proposal] Configuring dict spaces for rewards and terminations #973

Closed Mayankm96 closed 7 months ago

Mayankm96 commented 7 months ago

Proposal

To support more algorithmic works (multi-agent RL, multi-critic learning), it would be great to extend the support of rewards and terminations beyond the box range and bool, respectively.

Motivation

No response

Pitch

Gymnasium natively supports dict spaces for observations. Follow the same for other MDP signals.

Alternatives

No response

Additional context

Since Python is flexible with its type-hinting, nothing stops users from having a dict of rewards and terminations that they handle with their learning library. However, it would be nice to support it more explicitly.

Checklist

Kallinteris-Andreas commented 7 months ago

Multi objective gymnasium already has support for a reward vector

Why do you need non-bool termination?, note we already have info which can include termination components

also, can you check your email inbox (***@ethz.ch) thanks

Mayankm96 commented 7 months ago

Based on here, I see it does extend the array type to become a tensor (num_envs, num_rewards). However, we want to use dictionaries because they are easier/clearer to handle in the downstream learning frameworks. This, of course, depends on your application and preferences. That's why in Orbit, at least for observations, we let users decide whether they want to return dicts or a vector. We'd like to support the same for rewards and terminations.

Why do you need non-bool termination?, note we already have info which can include termination components

For multi-agent RL. You want to give separate bool signals to tell which agent did something wrong in a decentralized learning setup. The principle here for us go in the same direction as observations, i.e. have "groups" of obs/rewards/terminations.

Ideally, we would like to maintain a single inheritance of gym.VecEnv / gym.Env to avoid confusion. But from what I see in the MOVecEnv definition, inherited classes are free to define the spaces for all the signals based on their requirements. If that's the recommended path, then it's alright. We can handle the dict/box spaces internally in our framework's RL class.

Kallinteris-Andreas commented 7 months ago
  1. I'm not opposed to expanding the API, But this is something we have to discuss after the 1.0 release.
  2. For your case, for now it would be best for you to extend the API class.
  3. Are you doing multi-agent reinforcement learning with the gymnasium API instead of the Petting zoo parallel API?
Mayankm96 commented 7 months ago

No, I wasn't aware of Petting Zoo. But based on the documentation, it seems very close to the proposal above (dict of rewards and terminations).

I'll go with 2 for the framework. Would be happy to engage in further discussions to keep "one" base Env definition for all these applications.

pseudo-rnd-thoughts commented 7 months ago

Gymnasium is intended for only single-agent RL environments, allowing other projects to specialise for multi-objective / multi-reward and multi-agent. Therefore, we will almost certainly not be adding the features in this project

Mayankm96 commented 7 months ago

Maybe I am missing out on the consequences of changing the default type specs of the reward_range to a more general type.

Nonetheless, if it is fixated only on single-agent RL, then this discussion seems moot. Thanks for the help.

pseudo-rnd-thoughts commented 7 months ago

Thanks for understanding @Mayankm96, it sounds like an interesting project, good luck with it