Same seed different results

benelot / pybullet-gym

Open-source implementations of OpenAI Gym MuJoCo environments for use with the OpenAI Gym Reinforcement Learning Research Platform.

https://pybullet.org/

Other

831 stars 124 forks source link

Same seed different results #34

Open GPaolo opened 5 years ago

GPaolo commented 5 years ago

Hi, If I set the seed, I get different results between different runs.

I am attaching a small script to reproduce the issue:

import gym
import pybullet
import pybulletgym

env = gym.make('AntMuJoCoEnv-v0')

env.seed(7)

obs = env.reset()
act = env.action_space.sample()
print(act)

obs = env.reset()
act = env.action_space.sample()
print(act)

The results I obtain are:

[ 0.36626342  0.64521587 -0.06823247  0.3184726   0.49362287  0.06656755
  0.8346416   0.71423185]
[-0.88181156 -0.4371754   0.45050308  0.7611305   0.27231112 -0.43260667
 -0.3026008  -0.3178519 ]

And they change among any run.

I think this is a bug, cause the result should be always the same, given the same seed. Or am I doing something wrong?

benelot commented 5 years ago

Hello! Thanks for mentioning this! Can you check if you get the same issue on the pybullet envs as well? Since the reset of the state is handled directly by pybullet through its loading and saving mechanism, I do not have any influence on the deterministic execution of different runs.

GPaolo commented 5 years ago

Just tested. To have similar results among different runs, there is the need to set the seed also for the action space and observation space:

env.seed(7)
env.action_space.seed(7)
env.observation_space.seed(7)

To have the same results among different resets, the seed needs to be reset everytime. This script:

import gym
import pybullet
import pybulletgym

env = gym.make('AntPyBulletEnv-v0')

env.seed(7)
env.action_space.seed(7)
env.observation_space.seed(7)

obs = env.reset()
act = env.action_space.sample()
print(act)

env.seed(7)
env.action_space.seed(7)
env.observation_space.seed(7)

obs = env.reset()
act = env.action_space.sample()
print(act)

returns:

[-0.44954607  0.83736265 -0.20760961  0.75181586 -0.01520521  0.25760308
  0.06112269 -0.45786014]
[-0.44954607  0.83736265 -0.20760961  0.75181586 -0.01520521  0.25760308
  0.06112269 -0.45786014]

This happens with whatever environment I tested.

benelot commented 5 years ago

If you reset the seed in my envs every time, does that help too? If so, then we fix this to be stored across resets.

GPaolo commented 5 years ago

Yes, I tried to set the seed before every reset with different environments from the repo and it seems to work consistently.

I also think that the .seed() method should set not only the env seed, but also the action_space and the observation_space ones. At least to give consistency, given that in other Gym environments the only function I had to call to set the seed was .seed().

benelot commented 5 years ago

Ok, so we make that functionality consistent with mujoco and make it store the seed across resets.

On Thu, Nov 21, 2019, 15:11 Giuseppe Paolo notifications@github.com wrote:

Yes, I tried to set the seed before every reset with different environments from the repo and it seems to work consistently.

I also think that the .seed() method should set not only the env seed, but also the action_space and the observation_space ones. At least to give consistency, given that in other Gym environments the only function I had to call to set the seed was .seed().

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/benelot/pybullet-gym/issues/34?email_source=notifications&email_token=AAXXXK7K5CZ55OLKZPPTYF3QU2JKXA5CNFSM4JNLEHI2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEE2LGKY#issuecomment-557101867, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAXXXKZQX4MBRBFSWWADJTDQU2JKXANCNFSM4JNLEHIQ .