Farama-Foundation / Gymnasium

An API standard for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly Gym)
https://gymnasium.farama.org
MIT License
7.5k stars 841 forks source link

[Bug Report] check_step_determinism obscur working #1111

Open qgallouedec opened 5 months ago

qgallouedec commented 5 months ago

Describe the bug

It's the kind of issue that's hard to name or explain, or even reduce to simple code. But here's what I've observed since check_step_determinism was added: when I do the check myself, it passes. When it's the checker, it doesn't. For the moment the code depends on panda_gym, sorry for that, I'll reduce it in the past, but I wanted to postpone it as soon as possible.

Code example

import panda_gym
import gymnasium as gym
from gymnasium.utils.env_checker import check_env, data_equivalence

env = gym.make("PandaPickAndPlace-v3").unwrapped

seed = 123
env.action_space.seed(seed)
action = env.action_space.sample()
_, _ = env.reset(seed=seed)
obs_0, _, _, _, _ = env.step(action)
_, _ = env.reset(seed=seed)
obs_1, _, _, _, _ = env.step(action)

assert data_equivalence(obs_0, obs_1)  # Passes

check_env(env, skip_render_check=True)  # But! this fails in check_step_determinism
Traceback (most recent call last):
  File "/Users/quentingallouedec/panda-gym/94.py", line 16, in <module>
    check_env(env, skip_render_check=True)  # fails
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/quentingallouedec/panda-gym/env/lib/python3.11/site-packages/gymnasium/utils/env_checker.py", line 402, in check_env
    check_step_determinism(env)
  File "/Users/quentingallouedec/panda-gym/env/lib/python3.11/site-packages/gymnasium/utils/env_checker.py", line 198, in check_step_determinism
    assert data_equivalence(
           ^^^^^^^^^^^^^^^^^
AssertionError: Deterministic step observations are not equivalent for the same seed and action

What's even weirder, is that it only happens in two environments. I'll keep digging and let you know.

System info

Gymnasium 1.0.0a2 Panda-gym 10c4d8a

Additional context

No response

Checklist

Kallinteris-Andreas commented 4 months ago

The only thing I can think of is perhaps your environment does not properly reset internal state after the second reset

Does this pass?:

import panda_gym
import gymnasium as gym
from gymnasium.utils.env_checker import check_env, data_equivalence

env = gym.make("PandaPickAndPlace-v3").unwrapped

check_env(env, skip_render_check=True)  # does this fail??
qgallouedec commented 4 months ago

It fails. Doesn't make any sense 😅

import panda_gym
import gymnasium as gym
from gymnasium.utils.env_checker import check_env, data_equivalence
import traceback

env = gym.make("PandaPickAndPlace-v3").unwrapped

try:
    check_env(env, skip_render_check=True)  # Fails
except Exception as exc:
    traceback.print_exception(exc)

seed = 123
env.action_space.seed(seed)
action = env.action_space.sample()
_, _ = env.reset(seed=seed)
obs_0, _, _, _, _ = env.step(action)
_, _ = env.reset(seed=seed)
obs_1, _, _, _, _ = env.step(action)

assert data_equivalence(obs_0, obs_1)  # Passes
Traceback (most recent call last):
  File "/Users/quentingallouedec/panda-gym/94.py", line 9, in <module>
    check_env(env, skip_render_check=True)  # Fails
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/quentingallouedec/Gymnasium/gymnasium/utils/env_checker.py", line 412, in check_env
    check_reset_seed_determinism(env)
  File "/Users/quentingallouedec/Gymnasium/gymnasium/utils/env_checker.py", line 116, in check_reset_seed_determinism
    assert data_equivalence(
           ^^^^^^^^^^^^^^^^^
AssertionError: Using `env.reset(seed=123)` then `env.reset()` is non-deterministic as the observations are not equivalent.
Kallinteris-Andreas commented 4 months ago

I have no idea how, this is failing

check_step_determinism does the same thing: https://github.com/Farama-Foundation/Gymnasium/blob/a09dcfdcf08c4d7417f98c25f5bfec9ab2ff110d/gymnasium/utils/env_checker.py#L188-L218

you could try adding a breakpoint() in line 210 and printing obs_0 and obs_1, that might reveal something

pseudo-rnd-thoughts commented 4 months ago

I can reproduce the error but seems to require a strange setup

You must reset, step, reset, step for the second step to fail equivalence

If I change the error to assert then I can discover the obs["observation"] is the problem. Plus if I subtract the two data points, we can see it is in the second half only, this is from the task

AssertionError: data_1 - data_2=array([ 0.0000000e+00,  0.0000000e+00,  0.0000000e+00,  0.0000000e+00,
        0.0000000e+00,  0.0000000e+00,  0.0000000e+00, -4.6938658e-07,
        4.4703484e-07, -3.2596290e-07,  1.1971366e-05, -1.0663000e-05,
        3.4053983e-06,  2.5060897e-05, -7.6492070e-06, -2.2978888e-05,
        1.1928683e-03,  1.7360186e-03,  2.0111194e-04], dtype=float32)

Looking at all the environment, this is a problem for most of them except Reach and Slide

import gymnasium as gym
from gymnasium.utils.env_checker import data_equivalence

import panda_gym
from panda_gym.envs import PandaPickAndPlaceEnv, PandaFlipEnv, PandaPushEnv, PandaReachEnv, PandaSlideEnv, PandaStackEnv

gym.register_envs(panda_gym)

for env_cls in [PandaPickAndPlaceEnv, PandaFlipEnv, PandaPushEnv, PandaReachEnv, PandaSlideEnv, PandaStackEnv]:
    env = env_cls()
    print(f'{env}')

    seed = 123
    env.action_space.seed(seed)
    action_0 = env.action_space.sample()
    action_1 = env.action_space.sample()

    obs_0, _ = env.reset(seed=seed)
    obs_1, _, _, _, _ = env.step(action_0)
    obs_2, _ = env.reset()
    obs_3, _, _, _, _ = env.step(action_1)

    obs_4, _ = env.reset(seed=seed)
    obs_5, _, _, _, _ = env.step(action_0)
    obs_6, _ = env.reset()
    obs_7, _, _, _, _ = env.step(action_1)

    data_equivalence(obs_0, obs_4)
    data_equivalence(obs_1, obs_5)
    print(f'{obs_1["observation"] - obs_5["observation"]=}')
    data_equivalence(obs_2, obs_6)
    data_equivalence(obs_3, obs_7)

Of which, all of these differences only exist in the task (not the robot)

pseudo-rnd-thoughts commented 4 months ago

I've found a sort of source for the noise in observation. The _sample_object function in PushAndPlace task, if you comment out line 83 that adds the noise to the object_position, object_position += noise, the error disappears to PushAndPlace. However, if you print the noise value produced, its the same in the two episodes.

noise=array([0.04380345, 0.00589226, 0.        ])
<PandaPickAndPlaceEnv instance>
noise=array([-0.09722823,  0.09362835,  0.        ])
noise=array([-0.07651062,  0.09727248,  0.        ])
noise=array([-0.09722823,  0.09362835,  0.        ])
noise=array([-0.07651062,  0.09727248,  0.        ])
obs_1["observation"] - obs_5["observation"]=array([ 0.0000000e+00,  0.0000000e+00,  0.0000000e+00,  0.0000000e+00,
        0.0000000e+00,  0.0000000e+00,  0.0000000e+00, -4.6938658e-07,
        4.4703484e-07, -3.2596290e-07,  1.1971366e-05, -1.0663000e-05,
        3.4053983e-06,  2.5060897e-05, -7.6492070e-06, -2.2978888e-05,
        1.1928683e-03,  1.7360186e-03,  2.0111194e-04], dtype=float32)

I can't figure out why adding this noise is causing the output to change The problem is that if I change the code to

    def _sample_object(self) -> np.ndarray:
        """Randomize start position of object."""
        object_position = np.array([0.0, 0.0, self.object_size / 2])
        noise = self.np_random.uniform(self.obj_range_low, self.obj_range_high)
        object_position += np.array([-0.09722823,  0.09362835,  0.        ])
        return object_position

the problem persists even if we are still adding the noise

EDIT: The next day I can't replicate the last point

pseudo-rnd-thoughts commented 4 months ago

Looking at the next day, I can't replicate the problem I noted at the end

I tested the minimal example

seed = 123
env.action_space.seed(seed)
action_0 = env.action_space.sample()
action_1 = env.action_space.sample()

obs_0, _ = env.reset(seed=seed)
obs_1, _, _, _, _ = env.step(action_0)
obs_2, _ = env.reset()
obs_3, _, _, _, _ = env.step(action_1)  # This line is necessary

obs_4, _ = env.reset(seed=seed)
obs_5, _, _, _, _ = env.step(action_0)
obs_6, _ = env.reset()
# obs_7, _, _, _, _ = env.step(action_1)  # This line isn't necessary for the issue

print(f'{obs_1["observation"] - obs_5["observation"]=}')

Another test I made was to add another reset case to compare the 3 observations Interestingly, the three observations are different, meaning there is an unknown source of randomness that is deterministic (the observation error being constant across many runs). This is a very strange combination of deterministic unknown randomness

seed = 123
env.action_space.seed(seed)
action_0 = env.action_space.sample()
action_1 = env.action_space.sample()

obs_0, _ = env.reset(seed=seed)
obs_1, _, _, _, _ = env.step(action_0)
obs_2, _ = env.reset()
obs_3, _, _, _, _ = env.step(action_1)  # necessary

obs_4, _ = env.reset(seed=seed)
obs_5, _, _, _, _ = env.step(action_0)
obs_6, _ = env.reset()
# obs_7, _, _, _, _ = env.step(action_1)  # unnecessary

obs_8, _ = env.reset(seed=seed)
obs_9, _, _, _, _ = env.step(action_0)
obs_10, _ = env.reset()
# obs_11, _, _, _, _ = env.step(action_1)  # unnecessary

print(f'{obs_1["observation"] - obs_5["observation"]=}')
print(f'{obs_5["observation"] - obs_9["observation"]=}')
print(f'{obs_1["observation"] - obs_9["observation"]=}')

Checking the seeding, separating the action space and reset seeding, only the reset seeding affects the observation, i.e., the actual action taken doesn't matter

The last check I've made is related to the _sample_object function and the noise. Rechecking, I couldn't replicate the constant noise still causing the issue however in modifying the bounds I could avoid it. It seems like if the position is not close to zero then there isn't an issue If someone could plot a graph of errors for different object positions could be interesting to prove this