In BSuite, DeepSea generates a random action mapping and keeps this action mapping fixed during resets. The main purpose of a random action mapping is to make sure that a DQN agent can not trivially solve the environment just by having a bias towards the action "right".
Uses a deterministic action mapping if deterministic is True.
Randomly generates an action mapping for every reset if both sample_action_map is Trueand deterministic is False.
Uses a default action map, which is set to be a deterministic one, when sample_action_map is False and deterministic is False.
This poses a few problems:
It is not possible to use a random mapping without making the transitions stochastic.
Getting the default behaviour of BSuite, i.e. a fixed random mapping, requires workarounds such as generating the mapping by hand and setting env.action_mapping, or resetting the environment with a fixed key, which is not ideal for general agent-environment loops.
I think problem 1 is just a bug: the deterministic environment parameter should probably just discern between BSuite's "DeepSea" and "DeepSea Stochastic" environment.
Problem 2 could perhaps be fixed by changing the default env.action_mapping, or adding the action_mapping to env_state.
Finally, the randomize_actions environment parameter is currently unused, and it is unclear to me why the option of sample_action_map exists. Surely randomly generating the action mapping at the start of every episode makes the problem completely impossible to solve?
In BSuite, DeepSea generates a random action mapping and keeps this action mapping fixed during resets. The main purpose of a random action mapping is to make sure that a DQN agent can not trivially solve the environment just by having a bias towards the action "right".
Currently, Gymnax's DeepSea-bsuite implementation either:
deterministic
isTrue
.sample_action_map
isTrue
anddeterministic
isFalse
.sample_action_map
is False anddeterministic
isFalse
.This poses a few problems:
env.action_mapping
, or resetting the environment with a fixed key, which is not ideal for general agent-environment loops.I think problem 1 is just a bug: the
deterministic
environment parameter should probably just discern between BSuite's "DeepSea" and "DeepSea Stochastic" environment.Problem 2 could perhaps be fixed by changing the default
env.action_mapping
, or adding the action_mapping toenv_state
.Finally, the
randomize_actions
environment parameter is currently unused, and it is unclear to me why the option ofsample_action_map
exists. Surely randomly generating the action mapping at the start of every episode makes the problem completely impossible to solve?