benelot / pybullet-gym

Open-source implementations of OpenAI Gym MuJoCo environments for use with the OpenAI Gym Reinforcement Learning Research Platform.
https://pybullet.org/
Other
814 stars 124 forks source link

Save/Resume support #42

Closed floringogianu closed 8 months ago

floringogianu commented 4 years ago

Started work on #40.

Because saving and loading using pybullet.saveState() is not working across python processes (which don't share memory), I decided to use the pybullet.saveBullet() API which saves the state of the environment on the disk. This could potentially be expensive, so I added an example in which (on Linux at least) we can save the states to /run/shm/ which is a memory mapped file in fedora and ubuntu based distributions.

state_path = "/run/shm/state.bullet"  # or somewhere on the disk

env.save_env_state(state_path)
...
env.make(...)
env.reset()  # this is necessary
env.restore_env_state(state_path)

I think ideally pybullet would return the serialized state so we can pass it around among python processes, but I looked at the C++ backend and it doesn't look trivial to implement.

Compatibility with OpenAI/gym:

The contributors over at OpenAI/gym haven't decided on an API for saving and resuming state: https://github.com/openai/gym/issues/402.

ALE implements it like this:

   snapshot = env.ale.cloneState()
   ...
   env.ale.restoreState(snapshot)

The recommended way for MuJoCo is:

saved_state = env.sim.get_state()
...
env.sim.set_state(saved_state)

Or, if the env is entirely written in python you can just deepcopy it. Other envs, such as those based on Box2d simply don't have support for this.

Eventually we will need to get close to the API they'll decide on.

benelot commented 4 years ago

As far as I know, the reset mechanism in pybullet is based on saving and loading the state again. I would suggest to look into this as I believe it is not disk-based.

floringogianu commented 4 years ago

Yes, I tried first to work with the same mechanism used in env.reset(), as I mentioned above. However it does not work between processes. If you have any pointers to how it can be used, besides the way pybullet.saveState() is used in env.reset() and in this example here so that it works between python processes, let me know.

mantle2048 commented 3 years ago

Hi, thanks for your great work for saving and restoring state in pybullet-gym. It helped me a lot. But through the following test, I found that there is a problem in your implementation.

import gym
import pybulletgym

if __name__ == '__main__':
    env = gym.make("HalfCheetahMuJoCoEnv-v0")
    env.seed(0)
    env.action_space.seed(0)
    obs, done = env.reset(), False
    for i in range(200):
        nx_obs, rew, done, _ = env.step(env.action_space.sample())

    env.save_env_state('state.bullet')
    act = env.action_space.sample()
    state1 = env.step(act)

    env.restore_env_state('state.bullet')
    state2 = env.step(act)
    print(state1)
    print(state2)
#  the output of the test
(array([-4.1603717e-01,  1.4024903e-01,  3.2494479e-01,  3.7975675e-01,
        5.4050684e-01,  8.0007428e-01, -1.2108239e+00, -1.9657683e+00,
       -4.3360400e-01, -3.7912399e-01,  5.7443666e-01, -3.6629651e+00,
        2.3992314e+00,  1.3094640e+01, -4.5016697e-03, -4.8106198e+00,
        1.6316326e+01], dtype=float32), -0.6035058938981799, False, {})
(array([-4.1603717e-01,  1.4024903e-01,  3.2494479e-01,  3.7975675e-01,
        5.4050684e-01,  8.0007428e-01, -1.2108239e+00, -1.9657683e+00,
       -4.3360400e-01, -3.7912399e-01,  5.7443666e-01, -3.6629651e+00,
        2.3992314e+00,  1.3094640e+01, -4.5016697e-03, -4.8106198e+00,
        1.6316326e+01], dtype=float32), -0.22421693801879883, False, {})

As expected, state1 and state2 should exactly be the same, but the two env.step(act) produce different rewards. After debugging, I found that pybullet.saveBullet() or pybullet.saveState() only save the state of the emulator, but the reward value is calculated by Python code.

Specifically, taking the env HalfCheetahMuJoCoEnv-v0 as an example. self.robot represents the robot-HalfCheetah in the env.

In HalfCheetah, self.pos_after records the current position of the robot and the function calc_potential is used to update the value of self.pos_after and calc the potential which is a part of the reward.

After restoring the saved state, the state of the emulator did change. But the self.pos_after didn't change, the potential rewards in the first env.step() after restoring the saved state will be zero.

Therefore, I think we should save the pos_after of the current state in save_env_state, and then set pos_after to the correct value in restore_env_state.

The specific implementation is:

def save_env_state(self, file_path):
        self.prev_pos = self.robot.pos_after
    self._p.saveBullet(file_path)

def restore_env_state(self, file_path):
        self.robot.pos_after = self.prev_pos
    self._p.restoreState(fileName=file_path)
    try:
        os.remove(file_path)
    except FileNotFoundError:
        pass
#  After modification, the output of the test
(array([-0.406764  ,  0.13484241,  0.36529362,  0.31630808,  0.58991945,
        0.8009333 , -1.1393532 , -2.4010131 , -0.4189203 , -0.22211011,
        0.41537938, -3.5352209 ,  1.9927833 , 14.179333  , -0.05656279,
       -3.5729892 , 14.309292  ], dtype=float32), -0.5812075076813373, False, {})
(array([-0.406764  ,  0.13484241,  0.36529362,  0.31630808,  0.58991945,
        0.8009333 , -1.1393532 , -2.4010131 , -0.4189203 , -0.22211011,
        0.41537938, -3.5352209 ,  1.9927833 , 14.179333  , -0.05656279,
       -3.5729892 , 14.309292  ], dtype=float32), -0.5812075076813373, False, {})

the state1 and state2 are exactly the same.