Closed amacati closed 4 months ago
Can you try replacing the entire body of def _reset_sim(self):
with a single mj_resetData
mujoco.mj_resetData(self.model, self.data)
Yes, that's a more principled solution. I had to add it to the Fetch environments separately because they redefine RobotEnv._reset_sim()
. Also, I added the mujoco_py
equivalent to the deprecated environments. This passes all tests for me now, both for mujoco
environments and legacy mujoco_py
envs.
I have resolved your comments and removed the modifications to the mujoco-py
environments. Since they are expected to fail the new tests I have limited those to the mujoco
environments and documented the reason in the test description.
test_same_env_determinism_rollout
is effectively a reimplementation of gymnasium.utils.env_match.check_environments_match
, I do not believe it has any additional utility, if you agree remove it.You can use this test instead, to validate the change, which is better because it actually asserts that reset properly resets the environment state (note: check_mujoco_reset_state
will be importable with from gymnasium.envs.mujoco.utils import check_mujoco_reset_state
)
import mujoco
import numpy as np
import gymnasium
def get_state(
env: gymnasium.envs.mujoco.MujocoEnv,
state_type: mujoco.mjtState = mujoco.mjtState.mjSTATE_PHYSICS,
):
"""Gets the state of `env`.
Arguments:
env: Environment whose state to copy, `env.model` & `env.data` must be accessible.
state_type: see the [documentation of mjtState](https://mujoco.readthedocs.io/en/stable/APIreference/APItypes.html#mjtstate) most users can use the default for training purposes or `mujoco.mjtState.mjSTATE_INTEGRATION` for validation purposes.
"""
assert mujoco.__version__ >= "2.3.6", "Feature requires `mujuco>=2.3.6`"
state = np.empty(mujoco.mj_stateSize(env.unwrapped.model, state_type))
mujoco.mj_getState(env.unwrapped.model, env.unwrapped.data, state, state_type)
return state
def check_mujoco_reset_state(env: gymnasium.envs.mujoco.MujocoEnv, seed=1234):
"""Asserts that `env.reset` properly resets the state (not affected by previous steps), assuming `check_reset_seed` has passed."""
env.action_space.seed(seed)
action = env.action_space.sample()
env.reset(seed=seed)
first_reset_state = get_state(env, mujoco.mjtState.mjSTATE_INTEGRATION)
env.step(action)
env.reset(seed=seed)
second_reset_state = get_state(env, mujoco.mjtState.mjSTATE_INTEGRATION)
assert np.all(first_reset_state == second_reset_state), "reset is not deterministic"
robot_env
), this take 2 steps
2.1 update the changelog of each of those environments
2.2 update the version number in https://github.com/Farama-Foundation/Gymnasium-Robotics/blob/main/gymnasium_robotics/__init__.pygymnasium==1.0.0a2
I don't think that's correct. The test you mentioned steps through two environments and checks if the information from two distinct environments match at each step. This cannot detect the failure case of this PR since the deviation only occurs after an environment has been reset and stepped through for several times. If we wanted to catch the failure case with the same function, we'd have to run a rollout of one of the two environments before starting the check_environments_match
test.
The current test runs a rollout, resets the same env
object with the same seed, runs a second rollout, and then tests if the outputs of both rollouts match each other. The key difference is that it uses a single environment that steps multiple times before being reset and thus shows if effects like warming up etc are correctly accounted for.
I do think however that the combination of your check_mujoco_reset_state()
test function and the gymnasium check_environments_match
test cover it as well.
When bumping the environment version number, do we replace all the v2s, should we issue a deprecation error or is that done automatically? And do the required changes to the documentation also go into this PR?
1) yes bump all changed environments
2) deprecation warning are handled by gymnasium.make
3) yes, documentation should be changed in this PR
Hopefully the last question: Some envs are inheriting from MujocoRobotEnv, but the behavior seems to be unchanged. E.g. HandManipulateBlockFull-v1
passes the new tests also without any fixes. The environments with changed behavior are: FetchSlide-v2
, FetchPickAndPlace-v2
, FetchReach-v2
, FetchPush-v2
, FetchSlideDense-v2
, FetchPickAndPlaceDense-v2
, FetchReachDense-v2
, FetchPushDense-v2
, HandReach-v1
, HandReachDense-v1
. Therefore, all other variants of the hand tasks should be unchanged even with the new fixes, and should not be bumped. Is that okay with you?
Since shadow envs are not be affected by the change we do not have to worry about them
we should verify that they remain unchanged though
@rodrigodelazcano What is the policy for keeping old verisons of environments? In this case the change is practically small. And should not cause problems with comparing training performance, reusing policies and offline reinforcement learning with the data set generated from a previous environment version.
Update: I just checked. There are no datasets Minari for these environments).
@Kallinteris-Andreas Any news on when this will be merged?
@amacati waiting for gymnasium==1.0.0a2
which should be out soon(ish)
Note: I changed the changelog in the documentation to be shorter
Description
The gymnasium API allows users to seed the environment on each reset to yield reproducible results. Running the environment with the same seed should always give the exact same results. While the documentation recommends that users should seed reset only once, it does not forbid seeding multiple times.
FetchPickAndPlace-v2 does not yield reproducible results under these conditions. The reset observation is identical, but the observations start deviating at the first environment step using identical actions.
The inconsistencies arise because the internal Mujoco state is not restored properly when _reset_sim() is called. Specifically, the position and quaternions of the mocap bodies are currently not being reset. Furthermore, the Mujoco integrator uses warmstarts and caches the last controls in mjData. In the current implementation, these are also not reset. Only if these four mjData fields are properly restored to their initial states, env.reset(seed=seed) yields reproducible results.
Fixes #207
Type of change
Checklist:
pre-commit
checks withpre-commit run --all-files
(seeCONTRIBUTING.md
instructions to set it up)