The following code checks to see if when supplied with randomly sampled actions sampled from two action spaces supplied with the same seed, two instances of the pybullet ant environment will create the same observations. This code seems to fail intermittently in python3.5 and consistently in python3.6. For the life of me, I can't figure out what is causing the drift between environment instances.
System Specs:
Ubuntu 18.04
python 3.6 (also verified with python 3.5)
import gym
import pybulletgym
import numpy as np
if __name__ == "__main__":
env1 = gym.make("AntPyBulletEnv-v0")
env1.seed(0)
env1.action_space.seed(0)
env2 = gym.make("AntPyBulletEnv-v0")
env2.seed(0)
env2.action_space.seed(0)
obs1 = env1.reset()
obs2 = env2.reset()
for i in range(100):
if not np.array_equal(obs1 ,obs2 ):
for e1,e2 in zip(obs1,obs2):
if e1 != e2:
print(e1,e2)
exit("failed on obs")
action1 = env1.action_space.sample()
action2 = env2.action_space.sample()
if not np.array_equal(action1, action2):
print(action1, action2)
for a1,a2 in zip(action1,action2):
if a1 != a2:
print(e1,e2)
exit("failed on action")
print("env 1")
obs1, reward, done1, info = env1.step(action1)
print(action1, obs1)
print("env 2")
obs2, reward, done2, info = env2.step(action2)
print(action2, obs2)
if done1:
assert(done2)
if not np.array_equal(obs1 ,obs2 ):
for e1,e2 in zip(obs1,obs2):
if e1 != e2:
print(e1,e2)
exit("failed on obs")
obs1 = env1.reset()
obs2 = env2.reset()
## Output
...
env 1
[-0.19167283 -0.24867578 0.57644254 -0.6455737 0.5354068 0.95332575
-0.48177752 0.2555853 ] [-0.22058617 -0.07561598 0.997137 0.10426999 -0.00999914 0.11529232
0.162765 -0.11414797 -1.0032024 0.16360427 0.50044924 0.0312518
-0.58839995 0.32609588 -0.03842217 -0.13290787 0.2856137 0.26151842
1.1269124 0.05937529 -0.35873523 0.02834896 -0.61619514 0.8499997
1. 0. 0. 0. ]
env 2
[-0.19167283 -0.24867578 0.57644254 -0.6455737 0.5354068 0.95332575
-0.48177752 0.2555853 ] [-0.22060278 -0.07551313 0.9971448 0.10177492 -0.01480584 0.11399316
0.1625488 -0.11415483 -1.0028749 0.17036478 0.49990714 0.03061081
-0.58719516 0.33724806 -0.03812427 -0.13370141 0.28572914 0.2649531
1.1260623 0.05638258 -0.3578396 0.03535499 -0.6153823 0.8513867
1. 0. 0. 0. ]
-0.22058617 -0.22060278
-0.07561598 -0.07551313
0.997137 0.9971448
0.10426999 0.101774916
-0.009999137 -0.014805844
0.11529232 0.11399316
0.162765 0.1625488
-0.11414797 -0.11415483
-1.0032024 -1.0028749
0.16360427 0.17036478
0.50044924 0.49990714
0.031251803 0.030610807
-0.58839995 -0.58719516
0.32609588 0.33724806
-0.038422175 -0.03812427
-0.13290787 -0.13370141
0.2856137 0.28572914
0.26151842 0.2649531
1.1269124 1.1260623
0.059375294 0.056382578
-0.35873523 -0.3578396
0.028348956 0.035354994
-0.61619514 -0.6153823
0.8499997 0.8513867
failed on obs
The following code checks to see if when supplied with randomly sampled actions sampled from two action spaces supplied with the same seed, two instances of the pybullet ant environment will create the same observations. This code seems to fail intermittently in python3.5 and consistently in python3.6. For the life of me, I can't figure out what is causing the drift between environment instances.
System Specs:
Ubuntu 18.04 python 3.6 (also verified with python 3.5)