Open qgallouedec opened 5 months ago
The only thing I can think of is perhaps your environment does not properly reset internal state after the second reset
Does this pass?:
import panda_gym
import gymnasium as gym
from gymnasium.utils.env_checker import check_env, data_equivalence
env = gym.make("PandaPickAndPlace-v3").unwrapped
check_env(env, skip_render_check=True) # does this fail??
It fails. Doesn't make any sense 😅
import panda_gym
import gymnasium as gym
from gymnasium.utils.env_checker import check_env, data_equivalence
import traceback
env = gym.make("PandaPickAndPlace-v3").unwrapped
try:
check_env(env, skip_render_check=True) # Fails
except Exception as exc:
traceback.print_exception(exc)
seed = 123
env.action_space.seed(seed)
action = env.action_space.sample()
_, _ = env.reset(seed=seed)
obs_0, _, _, _, _ = env.step(action)
_, _ = env.reset(seed=seed)
obs_1, _, _, _, _ = env.step(action)
assert data_equivalence(obs_0, obs_1) # Passes
Traceback (most recent call last):
File "/Users/quentingallouedec/panda-gym/94.py", line 9, in <module>
check_env(env, skip_render_check=True) # Fails
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/quentingallouedec/Gymnasium/gymnasium/utils/env_checker.py", line 412, in check_env
check_reset_seed_determinism(env)
File "/Users/quentingallouedec/Gymnasium/gymnasium/utils/env_checker.py", line 116, in check_reset_seed_determinism
assert data_equivalence(
^^^^^^^^^^^^^^^^^
AssertionError: Using `env.reset(seed=123)` then `env.reset()` is non-deterministic as the observations are not equivalent.
I have no idea how, this is failing
check_step_determinism
does the same thing:
https://github.com/Farama-Foundation/Gymnasium/blob/a09dcfdcf08c4d7417f98c25f5bfec9ab2ff110d/gymnasium/utils/env_checker.py#L188-L218
you could try adding a breakpoint()
in line 210 and printing obs_0
and obs_1
, that might reveal something
I can reproduce the error but seems to require a strange setup
You must reset, step, reset, step for the second step to fail equivalence
If I change the error to assert then I can discover the obs["observation"]
is the problem. Plus if I subtract the two data points, we can see it is in the second half only, this is from the task
AssertionError: data_1 - data_2=array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, 0.0000000e+00,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00, -4.6938658e-07,
4.4703484e-07, -3.2596290e-07, 1.1971366e-05, -1.0663000e-05,
3.4053983e-06, 2.5060897e-05, -7.6492070e-06, -2.2978888e-05,
1.1928683e-03, 1.7360186e-03, 2.0111194e-04], dtype=float32)
Looking at all the environment, this is a problem for most of them except Reach and Slide
import gymnasium as gym
from gymnasium.utils.env_checker import data_equivalence
import panda_gym
from panda_gym.envs import PandaPickAndPlaceEnv, PandaFlipEnv, PandaPushEnv, PandaReachEnv, PandaSlideEnv, PandaStackEnv
gym.register_envs(panda_gym)
for env_cls in [PandaPickAndPlaceEnv, PandaFlipEnv, PandaPushEnv, PandaReachEnv, PandaSlideEnv, PandaStackEnv]:
env = env_cls()
print(f'{env}')
seed = 123
env.action_space.seed(seed)
action_0 = env.action_space.sample()
action_1 = env.action_space.sample()
obs_0, _ = env.reset(seed=seed)
obs_1, _, _, _, _ = env.step(action_0)
obs_2, _ = env.reset()
obs_3, _, _, _, _ = env.step(action_1)
obs_4, _ = env.reset(seed=seed)
obs_5, _, _, _, _ = env.step(action_0)
obs_6, _ = env.reset()
obs_7, _, _, _, _ = env.step(action_1)
data_equivalence(obs_0, obs_4)
data_equivalence(obs_1, obs_5)
print(f'{obs_1["observation"] - obs_5["observation"]=}')
data_equivalence(obs_2, obs_6)
data_equivalence(obs_3, obs_7)
Of which, all of these differences only exist in the task (not the robot)
I've found a sort of source for the noise in observation
.
The _sample_object
function in PushAndPlace
task, if you comment out line 83 that adds the noise to the object_position, object_position += noise
, the error disappears to PushAndPlace
.
However, if you print the noise value produced, its the same in the two episodes.
noise=array([0.04380345, 0.00589226, 0. ])
<PandaPickAndPlaceEnv instance>
noise=array([-0.09722823, 0.09362835, 0. ])
noise=array([-0.07651062, 0.09727248, 0. ])
noise=array([-0.09722823, 0.09362835, 0. ])
noise=array([-0.07651062, 0.09727248, 0. ])
obs_1["observation"] - obs_5["observation"]=array([ 0.0000000e+00, 0.0000000e+00, 0.0000000e+00, 0.0000000e+00,
0.0000000e+00, 0.0000000e+00, 0.0000000e+00, -4.6938658e-07,
4.4703484e-07, -3.2596290e-07, 1.1971366e-05, -1.0663000e-05,
3.4053983e-06, 2.5060897e-05, -7.6492070e-06, -2.2978888e-05,
1.1928683e-03, 1.7360186e-03, 2.0111194e-04], dtype=float32)
I can't figure out why adding this noise is causing the output to change The problem is that if I change the code to
def _sample_object(self) -> np.ndarray:
"""Randomize start position of object."""
object_position = np.array([0.0, 0.0, self.object_size / 2])
noise = self.np_random.uniform(self.obj_range_low, self.obj_range_high)
object_position += np.array([-0.09722823, 0.09362835, 0. ])
return object_position
the problem persists even if we are still adding the noise
EDIT: The next day I can't replicate the last point
Looking at the next day, I can't replicate the problem I noted at the end
I tested the minimal example
seed = 123
env.action_space.seed(seed)
action_0 = env.action_space.sample()
action_1 = env.action_space.sample()
obs_0, _ = env.reset(seed=seed)
obs_1, _, _, _, _ = env.step(action_0)
obs_2, _ = env.reset()
obs_3, _, _, _, _ = env.step(action_1) # This line is necessary
obs_4, _ = env.reset(seed=seed)
obs_5, _, _, _, _ = env.step(action_0)
obs_6, _ = env.reset()
# obs_7, _, _, _, _ = env.step(action_1) # This line isn't necessary for the issue
print(f'{obs_1["observation"] - obs_5["observation"]=}')
Another test I made was to add another reset case to compare the 3 observations Interestingly, the three observations are different, meaning there is an unknown source of randomness that is deterministic (the observation error being constant across many runs). This is a very strange combination of deterministic unknown randomness
seed = 123
env.action_space.seed(seed)
action_0 = env.action_space.sample()
action_1 = env.action_space.sample()
obs_0, _ = env.reset(seed=seed)
obs_1, _, _, _, _ = env.step(action_0)
obs_2, _ = env.reset()
obs_3, _, _, _, _ = env.step(action_1) # necessary
obs_4, _ = env.reset(seed=seed)
obs_5, _, _, _, _ = env.step(action_0)
obs_6, _ = env.reset()
# obs_7, _, _, _, _ = env.step(action_1) # unnecessary
obs_8, _ = env.reset(seed=seed)
obs_9, _, _, _, _ = env.step(action_0)
obs_10, _ = env.reset()
# obs_11, _, _, _, _ = env.step(action_1) # unnecessary
print(f'{obs_1["observation"] - obs_5["observation"]=}')
print(f'{obs_5["observation"] - obs_9["observation"]=}')
print(f'{obs_1["observation"] - obs_9["observation"]=}')
Checking the seeding, separating the action space and reset seeding, only the reset seeding affects the observation, i.e., the actual action taken doesn't matter
The last check I've made is related to the _sample_object
function and the noise.
Rechecking, I couldn't replicate the constant noise still causing the issue however in modifying the bounds I could avoid it.
It seems like if the position is not close to zero then there isn't an issue
If someone could plot a graph of errors for different object positions could be interesting to prove this
Describe the bug
It's the kind of issue that's hard to name or explain, or even reduce to simple code. But here's what I've observed since
check_step_determinism
was added: when I do the check myself, it passes. When it's the checker, it doesn't. For the moment the code depends on panda_gym, sorry for that, I'll reduce it in the past, but I wanted to postpone it as soon as possible.Code example
What's even weirder, is that it only happens in two environments. I'll keep digging and let you know.
System info
Gymnasium 1.0.0a2 Panda-gym 10c4d8a
Additional context
No response
Checklist