benelot / pybullet-gym

Open-source implementations of OpenAI Gym MuJoCo environments for use with the OpenAI Gym Reinforcement Learning Research Platform.
https://pybullet.org/
Other
814 stars 124 forks source link

Terminal state in inverted_pendulum_env.py #44

Closed bryanbocao closed 4 years ago

bryanbocao commented 4 years ago

Thanks for the efforts in making this repo!

Looks like the code was trying to combine these two lines of code from the original OpenAI Gym:

code0:

notdone = np.isfinite(ob).all() and (np.abs(ob[1]) <= .2)
done = not notdone

into one line of code

code1:

https://github.com/benelot/pybullet-gym/blob/79396fccc1d0e170a092b50137148da48ee2edab/pybulletgym/envs/mujoco/envs/pendulum/inverted_pendulum_env.py#L32

But if we follow the logic to convert code0 to code1-like version step by step, shouldn't it be:

notdone = np.isfinite(ob).all() and (np.abs(ob[1]) <= .2)
done = not notdone

=> step1

done = not np.isfinite(ob).all() and (np.abs(ob[1]) <= .2)

=> step2

done = not (np.isfinite(ob).all() and (np.abs(ob[1]) <= .2))

=> step3

done = (not np.isfinite(ob).all()) or (not np.abs(ob[1]) <= .2)

=> step4

code2:

done = (not np.isfinite(ob).all()) or (np.abs(ob[1]) > .2)

Note the difference between the precedences of bracket and 'not' in code1 and code2.

Update on Y2020M06D08Mon, code1 is equivalent to code2.

pierre-si commented 4 years ago

Hello, According to the python reference, the precedence of the "not" operator is higher than the precedence of the "and" operator which is higher than the precedence of the "or" operator. Which means that: done = (not np.isfinite(ob).all()) or (np.abs(ob[1]) > .2) is equivalent to: done = not np.isfinite(ob).all() or np.abs(ob[1]) > .2 ie the line of code in the repo.
Also,

notdone = np.isfinite(ob).all() and (np.abs(ob[1]) <= .2) 
done = not notdone

is equivalent to: done = not (np.isfinite(ob).all() and (np.abs(ob[1]) <= .2)) not: done = not np.isfinite(ob).all() and (np.abs(ob[1]) <= .2) due to the precedence of "not" over "and", the parenthesis are required.

bryanbocao commented 4 years ago

Hi, thanks for pointing out.

Yes, I agree on both arguments

  1. Which means that: done = (not np.isfinite(ob).all()) or (np.abs(ob[1]) > .2) is equivalent to: done = not np.isfinite(ob).all() or np.abs(ob[1]) > .2

The reason why I retain the parenthesis in (not np.isfinite(ob).all()) and (np.abs(ob[1]) > .2) is to make the logic clear with less confusion.

2.

notdone = np.isfinite(ob).all() and (np.abs(ob[1]) <= .2) 
done = not notdone

is equivalent to:

done = not (np.isfinite(ob).all() and (np.abs(ob[1]) <= .2))

not:

done = not np.isfinite(ob).all() and (np.abs(ob[1]) <= .2)

due to the precedence of "not" over "and", the parenthesis are required.

Thanks for pointing out my previous mistake in step1 but steps from step2 onwards are correct.

done = not (np.isfinite(ob).all() and (np.abs(ob[1]) <= .2))

is the same as my step2.

Thanks!

bryanbocao commented 4 years ago

To conclude, the logic of https://github.com/benelot/pybullet-gym/blob/79396fccc1d0e170a092b50137148da48ee2edab/pybulletgym/envs/mujoco/envs/pendulum/inverted_pendulum_env.py#L32

is equivalent to that in the original OpenAI Gym.