benelot / pybullet-gym

Open-source implementations of OpenAI Gym MuJoCo environments for use with the OpenAI Gym Reinforcement Learning Research Platform.
https://pybullet.org/
Other
823 stars 123 forks source link

apparent environment seed issues with ant environment #72

Open balisujohn opened 3 years ago

balisujohn commented 3 years ago

The following code checks to see if when supplied with randomly sampled actions sampled from two action spaces supplied with the same seed, two instances of the pybullet ant environment will create the same observations. This code seems to fail intermittently in python3.5 and consistently in python3.6. For the life of me, I can't figure out what is causing the drift between environment instances.

System Specs:

Ubuntu 18.04 python 3.6 (also verified with python 3.5)



import gym
import pybulletgym
import numpy as np

if __name__ == "__main__":

    env1 = gym.make("AntPyBulletEnv-v0")
    env1.seed(0)
    env1.action_space.seed(0)

    env2 = gym.make("AntPyBulletEnv-v0")
    env2.seed(0)
    env2.action_space.seed(0)

    obs1 = env1.reset()
    obs2 = env2.reset()
    for i in range(100):
        if not np.array_equal(obs1 ,obs2 ):
            for e1,e2 in zip(obs1,obs2):
                if e1 != e2:
                    print(e1,e2)
            exit("failed on obs")
        action1 = env1.action_space.sample()
        action2 = env2.action_space.sample()
        if not np.array_equal(action1, action2):
            print(action1, action2)
            for a1,a2 in zip(action1,action2):
                if a1 != a2:
                    print(e1,e2)
            exit("failed on action")
        print("env 1")
        obs1, reward, done1, info = env1.step(action1)
        print(action1, obs1)
        print("env 2")
        obs2, reward, done2, info = env2.step(action2)
        print(action2, obs2)
        if done1:
            assert(done2)
            if not np.array_equal(obs1 ,obs2 ):
                for e1,e2 in zip(obs1,obs2):
                    if e1 != e2:
                        print(e1,e2)
                exit("failed on obs")
            obs1 = env1.reset()
            obs2 = env2.reset()

## Output
...
env 1
[-0.19167283 -0.24867578  0.57644254 -0.6455737   0.5354068   0.95332575
 -0.48177752  0.2555853 ] [-0.22058617 -0.07561598  0.997137    0.10426999 -0.00999914  0.11529232
  0.162765   -0.11414797 -1.0032024   0.16360427  0.50044924  0.0312518
 -0.58839995  0.32609588 -0.03842217 -0.13290787  0.2856137   0.26151842
  1.1269124   0.05937529 -0.35873523  0.02834896 -0.61619514  0.8499997
  1.          0.          0.          0.        ]
env 2
[-0.19167283 -0.24867578  0.57644254 -0.6455737   0.5354068   0.95332575
 -0.48177752  0.2555853 ] [-0.22060278 -0.07551313  0.9971448   0.10177492 -0.01480584  0.11399316
  0.1625488  -0.11415483 -1.0028749   0.17036478  0.49990714  0.03061081
 -0.58719516  0.33724806 -0.03812427 -0.13370141  0.28572914  0.2649531
  1.1260623   0.05638258 -0.3578396   0.03535499 -0.6153823   0.8513867
  1.          0.          0.          0.        ]
-0.22058617 -0.22060278
-0.07561598 -0.07551313
0.997137 0.9971448
0.10426999 0.101774916
-0.009999137 -0.014805844
0.11529232 0.11399316
0.162765 0.1625488
-0.11414797 -0.11415483
-1.0032024 -1.0028749
0.16360427 0.17036478
0.50044924 0.49990714
0.031251803 0.030610807
-0.58839995 -0.58719516
0.32609588 0.33724806
-0.038422175 -0.03812427
-0.13290787 -0.13370141
0.2856137 0.28572914
0.26151842 0.2649531
1.1269124 1.1260623
0.059375294 0.056382578
-0.35873523 -0.3578396
0.028348956 0.035354994
-0.61619514 -0.6153823
0.8499997 0.8513867
failed on obs