amacati commented 7 months ago

Description

The gymnasium API allows users to seed the environment on each reset to yield reproducible results. Running the environment with the same seed should always give the exact same results. While the documentation recommends that users should seed reset only once, it does not forbid seeding multiple times.

FetchPickAndPlace-v2 does not yield reproducible results under these conditions. The reset observation is identical, but the observations start deviating at the first environment step using identical actions.

The inconsistencies arise because the internal Mujoco state is not restored properly when _reset_sim() is called. Specifically, the position and quaternions of the mocap bodies are currently not being reset. Furthermore, the Mujoco integrator uses warmstarts and caches the last controls in mjData. In the current implementation, these are also not reset. Only if these four mjData fields are properly restored to their initial states, env.reset(seed=seed) yields reproducible results.

Fixes #207

Type of change

[x] Bug fix (non-breaking change which fixes an issue)

Checklist:

[x] I have run the pre-commit checks with pre-commit run --all-files (see CONTRIBUTING.md instructions to set it up)
[x] I have commented my code, particularly in hard-to-understand areas
[x] I have made corresponding changes to the documentation
[x] My changes generate no new warnings
[x] I have added tests that prove my fix is effective or that my feature works
[x] New and existing unit tests pass locally with my changes

Kallinteris-Andreas commented 7 months ago

Can you try replacing the entire body of def _reset_sim(self): with a single mj_resetData

        mujoco.mj_resetData(self.model, self.data)

amacati commented 7 months ago

Yes, that's a more principled solution. I had to add it to the Fetch environments separately because they redefine RobotEnv._reset_sim(). Also, I added the mujoco_py equivalent to the deprecated environments. This passes all tests for me now, both for mujoco environments and legacy mujoco_py envs.

amacati commented 7 months ago

I have resolved your comments and removed the modifications to the mujoco-py environments. Since they are expected to fail the new tests I have limited those to the mujoco environments and documented the reason in the test description.

Kallinteris-Andreas commented 7 months ago

your test_same_env_determinism_rollout is effectively a reimplementation of gymnasium.utils.env_match.check_environments_match, I do not believe it has any additional utility, if you agree remove it.

You can use this test instead, to validate the change, which is better because it actually asserts that reset properly resets the environment state (note: check_mujoco_reset_state will be importable with from gymnasium.envs.mujoco.utils import check_mujoco_reset_state)

import mujoco
import numpy as np

import gymnasium

def get_state(
    env: gymnasium.envs.mujoco.MujocoEnv,
    state_type: mujoco.mjtState = mujoco.mjtState.mjSTATE_PHYSICS,
):
    """Gets the state of `env`.
    Arguments:
        env: Environment whose state to copy, `env.model` & `env.data` must be accessible.
        state_type: see the [documentation of mjtState](https://mujoco.readthedocs.io/en/stable/APIreference/APItypes.html#mjtstate) most users can use the default for training purposes or `mujoco.mjtState.mjSTATE_INTEGRATION` for validation purposes.
    """
    assert mujoco.__version__ >= "2.3.6", "Feature requires `mujuco>=2.3.6`"

    state = np.empty(mujoco.mj_stateSize(env.unwrapped.model, state_type))
    mujoco.mj_getState(env.unwrapped.model, env.unwrapped.data, state, state_type)
    return state

def check_mujoco_reset_state(env: gymnasium.envs.mujoco.MujocoEnv, seed=1234):
    """Asserts that `env.reset` properly resets the state (not affected by previous steps), assuming `check_reset_seed` has passed."""
    env.action_space.seed(seed)
    action = env.action_space.sample()

    env.reset(seed=seed)
    first_reset_state = get_state(env, mujoco.mjtState.mjSTATE_INTEGRATION)
    env.step(action)

    env.reset(seed=seed)
    second_reset_state = get_state(env, mujoco.mjtState.mjSTATE_INTEGRATION)

    assert np.all(first_reset_state == second_reset_state), "reset is not deterministic"

This will require version bumping for all the affected environments (all based on robot_env), this take 2 steps 2.1 update the changelog of each of those environments 2.2 update the version number in https://github.com/Farama-Foundation/Gymnasium-Robotics/blob/main/gymnasium_robotics/__init__.py
I will likely not be merging this before gymnasium==1.0.0a2

amacati commented 7 months ago

I don't think that's correct. The test you mentioned steps through two environments and checks if the information from two distinct environments match at each step. This cannot detect the failure case of this PR since the deviation only occurs after an environment has been reset and stepped through for several times. If we wanted to catch the failure case with the same function, we'd have to run a rollout of one of the two environments before starting the check_environments_match test.

The current test runs a rollout, resets the same env object with the same seed, runs a second rollout, and then tests if the outputs of both rollouts match each other. The key difference is that it uses a single environment that steps multiple times before being reset and thus shows if effects like warming up etc are correctly accounted for.

I do think however that the combination of your check_mujoco_reset_state() test function and the gymnasium check_environments_match test cover it as well.

amacati commented 7 months ago

When bumping the environment version number, do we replace all the v2s, should we issue a deprecation error or is that done automatically? And do the required changes to the documentation also go into this PR?

Kallinteris-Andreas commented 7 months ago

1) yes bump all changed environments 2) deprecation warning are handled by gymnasium.make 3) yes, documentation should be changed in this PR

amacati commented 7 months ago

Hopefully the last question: Some envs are inheriting from MujocoRobotEnv, but the behavior seems to be unchanged. E.g. HandManipulateBlockFull-v1 passes the new tests also without any fixes. The environments with changed behavior are: FetchSlide-v2, FetchPickAndPlace-v2, FetchReach-v2, FetchPush-v2, FetchSlideDense-v2, FetchPickAndPlaceDense-v2, FetchReachDense-v2, FetchPushDense-v2, HandReach-v1, HandReachDense-v1. Therefore, all other variants of the hand tasks should be unchanged even with the new fixes, and should not be bumped. Is that okay with you?

Kallinteris-Andreas commented 7 months ago

Since shadow envs are not be affected by the change we do not have to worry about them

we should verify that they remain unchanged though

Kallinteris-Andreas commented 6 months ago

@rodrigodelazcano What is the policy for keeping old verisons of environments? In this case the change is practically small. And should not cause problems with comparing training performance, reusing policies and offline reinforcement learning with the data set generated from a previous environment version.

Update: I just checked. There are no datasets Minari for these environments).

amacati commented 5 months ago

@Kallinteris-Andreas Any news on when this will be merged?

Kallinteris-Andreas commented 5 months ago

@amacati waiting for gymnasium==1.0.0a2 which should be out soon(ish)

Kallinteris-Andreas commented 4 months ago

Note: I changed the changelog in the documentation to be shorter

Farama-Foundation / Gymnasium-Robotics

Add new Fetch-v3 and HandReacher-v2 environments (Fix reproducibility issues) #208

Description

Type of change

Checklist: