Fix InvertedPendulumMuJoCoEnv done signal.

benelot / pybullet-gym

Open-source implementations of OpenAI Gym MuJoCo environments for use with the OpenAI Gym Reinforcement Learning Research Platform.

https://pybullet.org/

Other

831 stars 124 forks source link

Fix InvertedPendulumMuJoCoEnv done signal. #41

Closed floringogianu closed 1 year ago

floringogianu commented 4 years ago

On env.step() the done signal should be a bool, not a tuple. Also check gym.envs.mujoco.inverted_pendulum.py.

LostXine commented 4 years ago

hi floringogianu, Thank you for making this pr. Would you like to explain why this done = not (np.isfinite(state).all() or np.abs(state[1]) > .2) can fix it? In my opinion, we should use done = not np.isfinite(state).all() or (np.abs(state[1]) > .2).any(). Because np.abs(state[1]) > .2 gives a np.array here. Correct me if I'm wrong. Thank you.

benelot commented 3 years ago

@floringogianu Thanks for your contribution! @LostXine seems to have a valid argument here, no?

floringogianu commented 3 years ago

@benelot Sorry for not replying for a while, somehow I missed @LostXine observation. I haven't worked with pybullet since last year but I'll find the time to install it again and check this out. If np.abs(state[1]) > .2 is indeed an array (although I don't remember that being the case) then yes, @LostXine solution is the right one.