Closed floringogianu closed 8 months ago
As far as I know, the reset mechanism in pybullet is based on saving and loading the state again. I would suggest to look into this as I believe it is not disk-based.
Yes, I tried first to work with the same mechanism used in env.reset()
, as I mentioned above. However it does not work between processes. If you have any pointers to how it can be used, besides the way pybullet.saveState()
is used in env.reset()
and in this example here so that it works between python processes, let me know.
Hi, thanks for your great work for saving and restoring state in pybullet-gym. It helped me a lot. But through the following test, I found that there is a problem in your implementation.
import gym
import pybulletgym
if __name__ == '__main__':
env = gym.make("HalfCheetahMuJoCoEnv-v0")
env.seed(0)
env.action_space.seed(0)
obs, done = env.reset(), False
for i in range(200):
nx_obs, rew, done, _ = env.step(env.action_space.sample())
env.save_env_state('state.bullet')
act = env.action_space.sample()
state1 = env.step(act)
env.restore_env_state('state.bullet')
state2 = env.step(act)
print(state1)
print(state2)
# the output of the test
(array([-4.1603717e-01, 1.4024903e-01, 3.2494479e-01, 3.7975675e-01,
5.4050684e-01, 8.0007428e-01, -1.2108239e+00, -1.9657683e+00,
-4.3360400e-01, -3.7912399e-01, 5.7443666e-01, -3.6629651e+00,
2.3992314e+00, 1.3094640e+01, -4.5016697e-03, -4.8106198e+00,
1.6316326e+01], dtype=float32), -0.6035058938981799, False, {})
(array([-4.1603717e-01, 1.4024903e-01, 3.2494479e-01, 3.7975675e-01,
5.4050684e-01, 8.0007428e-01, -1.2108239e+00, -1.9657683e+00,
-4.3360400e-01, -3.7912399e-01, 5.7443666e-01, -3.6629651e+00,
2.3992314e+00, 1.3094640e+01, -4.5016697e-03, -4.8106198e+00,
1.6316326e+01], dtype=float32), -0.22421693801879883, False, {})
As expected, state1 and state2 should exactly be the same, but the two env.step(act)
produce different rewards. After debugging, I found that pybullet.saveBullet()
or pybullet.saveState()
only save the state of the emulator, but the reward value is calculated by Python code.
Specifically, taking the env HalfCheetahMuJoCoEnv-v0 as an example. self.robot
represents the robot-HalfCheetah in the env.
In HalfCheetah, self.pos_after
records the current position of the robot and the function calc_potential is used to update the value of self.pos_after
and calc the potential which is a part of the reward.
After restoring the saved state, the state of the emulator did change. But the self.pos_after
didn't change, the potential rewards in the first env.step()
after restoring the saved state will be zero.
Therefore, I think we should save the pos_after
of the current state in save_env_state
, and then set pos_after
to the correct value in restore_env_state
.
The specific implementation is:
def save_env_state(self, file_path):
self.prev_pos = self.robot.pos_after
self._p.saveBullet(file_path)
def restore_env_state(self, file_path):
self.robot.pos_after = self.prev_pos
self._p.restoreState(fileName=file_path)
try:
os.remove(file_path)
except FileNotFoundError:
pass
# After modification, the output of the test
(array([-0.406764 , 0.13484241, 0.36529362, 0.31630808, 0.58991945,
0.8009333 , -1.1393532 , -2.4010131 , -0.4189203 , -0.22211011,
0.41537938, -3.5352209 , 1.9927833 , 14.179333 , -0.05656279,
-3.5729892 , 14.309292 ], dtype=float32), -0.5812075076813373, False, {})
(array([-0.406764 , 0.13484241, 0.36529362, 0.31630808, 0.58991945,
0.8009333 , -1.1393532 , -2.4010131 , -0.4189203 , -0.22211011,
0.41537938, -3.5352209 , 1.9927833 , 14.179333 , -0.05656279,
-3.5729892 , 14.309292 ], dtype=float32), -0.5812075076813373, False, {})
the state1 and state2 are exactly the same.
Started work on #40.
Because saving and loading using
pybullet.saveState()
is not working across python processes (which don't share memory), I decided to use thepybullet.saveBullet()
API which saves the state of the environment on the disk. This could potentially be expensive, so I added an example in which (on Linux at least) we can save the states to/run/shm/
which is a memory mapped file in fedora and ubuntu based distributions.I think ideally
pybullet
would return the serialized state so we can pass it around among python processes, but I looked at theC++
backend and it doesn't look trivial to implement.Compatibility with OpenAI/gym:
The contributors over at OpenAI/gym haven't decided on an API for saving and resuming state: https://github.com/openai/gym/issues/402.
ALE implements it like this:
The recommended way for MuJoCo is:
Or, if the env is entirely written in python you can just
deepcopy
it. Other envs, such as those based onBox2d
simply don't have support for this.Eventually we will need to get close to the API they'll decide on.