RobertTLange / gymnax

RL Environments in JAX 🌍
Apache License 2.0
577 stars 54 forks source link

The action-mapping in DeepSea-bsuite does not behave like the original DeepSea environment #77

Open Pascal314 opened 1 month ago

Pascal314 commented 1 month ago

In BSuite, DeepSea generates a random action mapping and keeps this action mapping fixed during resets. The main purpose of a random action mapping is to make sure that a DQN agent can not trivially solve the environment just by having a bias towards the action "right".

Currently, Gymnax's DeepSea-bsuite implementation either:

This poses a few problems:

  1. It is not possible to use a random mapping without making the transitions stochastic.
  2. Getting the default behaviour of BSuite, i.e. a fixed random mapping, requires workarounds such as generating the mapping by hand and setting env.action_mapping, or resetting the environment with a fixed key, which is not ideal for general agent-environment loops.

I think problem 1 is just a bug: the deterministic environment parameter should probably just discern between BSuite's "DeepSea" and "DeepSea Stochastic" environment.

Problem 2 could perhaps be fixed by changing the default env.action_mapping, or adding the action_mapping to env_state.

Finally, the randomize_actions environment parameter is currently unused, and it is unclear to me why the option of sample_action_map exists. Surely randomly generating the action mapping at the start of every episode makes the problem completely impossible to solve?