Farama-Foundation / Gymnasium

An API standard for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly Gym)
https://gymnasium.farama.org
MIT License
6.84k stars 762 forks source link

[Proposal] Improve `check_env` #927

Closed Kallinteris-Andreas closed 6 months ago

Kallinteris-Andreas commented 7 months ago

Proposal

Hey, one issue in Gymansium-Robotics was discovered, which could have been avoided if check_env was improved

Critique 1

utils.check_env.data_equivilance makes all_close comparisons, instead of equality comparison, Line 57 at: https://github.com/Farama-Foundation/Gymnasium/blob/806d8afe9358d9ff2b092f4a4d7ccf0b320690a8/gymnasium/utils/env_checker.py#L33-L63

proposed solution

add an exact argument to data_equivilance and it should be used as True when env.spec.nondeterministic=False.

Note: all gymnasium & gymnasium-robotics environments pass with exact=True

Critique 2

It does not test that env.reset() properly resets the internal state, it just checks that initial observation is the same

proposed solution

for now, we can not have a solution that works across environments

Critique 3

It does test env.step()

proposed solution

add a simple check_step test function, which assumes that reset(seed) is deterministic (tested before running this function) and asserts that step is deterministic.

Motivation

No response

Pitch

No response

Alternatives

No response

Additional context

No response

Checklist

Kallinteris-Andreas commented 6 months ago

@pseudo-rnd-thoughts regarding Critique 1 1) is it a good idea to change the default exact argument of data_equivilance to True? 2) or change the make function call to include exact=True (at least inside env_checker.py)?

pseudo-rnd-thoughts commented 6 months ago

Sorry I didn't respond when this was originally posted. I'm worried about using allequal rather than allclose due to float point numbers particularly for robotics environments.

An alternative solution for check_env is to use both allclose and allequal. For a given observation, we first check if allequal, if False then run allclose, if all close fails then we raise an error about that otherwise we just log to the users the non-equalness as an environment can be deterministic to e-8 but not equivalent.

I know from talking to the Mujoco devs that due to physics changes / optimisations that mujoco versions are not deterministic and can produce different observations however within e-10 or so I believe.

Kallinteris-Andreas commented 6 months ago

1) there is no point in running allequal and then if it fails allclose, this is effectively the same as running allclose 2) for a given version of the mujuco library, the behavior will be 100% exact (unless we or the mujoco team make a mistake)

RedTachyon commented 6 months ago

Keep in mind checking floats for equality is always asking for trouble. Float operations are not associative, and strictly speaking, it's not even true that a == a (nan) or a + b == a + b (precision fuckery). If there's any GPU involved (which might be the case even in mujoco due to the brax stuff?), nothing is deterministic at that point due to the parallel execution + nonassociativity.

Some opt-in flag to enable "strict" mode might be alright, but it's still a bit of a footgun threat.

Kallinteris-Andreas commented 6 months ago

You make a good point we should not change the default Behaviour of data_eqivilence

but for check_env I believe we should use exact=true.

pseudo-rnd-thoughts commented 6 months ago

there is no point in running allequal and then if it fails allclose, this is effectively the same as running allclose

My suggestion would run allequal first then allclose so there are cases where the first would fail but the second won't

Kallinteris-Andreas commented 6 months ago

@pseudo-rnd-thoughts do we agree that: allclose(DATA) == allequal(DATA) or allclose(DATA)

pseudo-rnd-thoughts commented 6 months ago

Yes, but my suggestion is to run allequal first and raise only a warning if false, only then to run allclose as a way of detecting issues

Kallinteris-Andreas commented 6 months ago

I do not see the benefit to that. since check_env makes operations deterministically.