google-deepmind / dm_control

Google DeepMind's software stack for physics-based simulation and Reinforcement Learning environments, using MuJoCo.
Apache License 2.0
3.76k stars 666 forks source link

Setting states does not yield consistent rendering #187

Closed snailrowen1337 closed 3 years ago

snailrowen1337 commented 3 years ago

I am running deepmind control and rendering with OSmesa. When setting the states, the rendered results are not consistent. Consider the example below, where I manually set the state before and after taking an action. Since I set the state, I would expect the same output from the rendered:

from dm_control import suite

env = suite.load(domain_name='cartpole', task_name='swingup')
state = np.array([1.3, 5.3, 0.1, 2.3])
action = np.array([0.3])
env.reset()
phys = env.physics

phys.set_state(state)
obs1 = phys.render()
env.step(action)
phys.set_state(state)
obs2 = phys.render()
h = lambda img : hash(img.data.tobytes())
print('>>>> should be equal', h(obs1), h(obs2))

However, the rendered images are not the same and I get:

>>>> should be equal 4676723617026791659 7575396285825095944

When I comment out env.step(action), I get consistent results, so it's not an issue with the rendering itself:

>>>> should be equal -9159741910536992866 -9159741910536992866

Is there an issue with my installation, or am I misunderstanding the semantics?

snailrowen1337 commented 3 years ago

After looking at https://github.com/deepmind/dm_control/issues/64, it seems like there might be issues with the warmstart buffer. So instead of manually stepping with the environment, consider the following example where I create two separate environments, and render each of them twice

from dm_control import suite

def create():
    env = suite.load(domain_name='cartpole', task_name='swingup')
    state = np.array([1.3, 5.3, 0.1, 2.3])
    env.reset()
    phys = env.physics
    phys.set_state(state)
    obs1 = phys.render()
    obs2 = phys.render()
    h = lambda img : hash(img.data.tobytes())
    print('>>>> should be equal', h(obs1), h(obs2))
create()
create()

When doing this I get

>>>> should be equal -6963634081576593535 -6963634081576593535
>>>> should be equal -9021884202509515482 -9021884202509515482

So given an environment, the rendering seems deterministic. But creating two environments and setting their states to the same value does not give deterministic rendering with OSmesa. I was hoping that this setup would eliminate issues with the warmstart buffer. Am I misunderstanding something here? Thanks!!

snailrowen1337 commented 3 years ago

Sorry, this seems to be resolved once I set seeds properly. The original culprit seems to have been the warmstart buffer.

snailrowen1337 commented 3 years ago

Ok, so when setting the seeds properly, the rendering is consistent within a process. But not between processes. Consider the following code:

from dm_control import suite

def create():
    env = suite.load(domain_name='cartpole', task_name='swingup', task_kwargs={'random': 32})
    state = np.array([1.3, 5.3, 0.1, 2.3])
    action = np.array([0.3])
    phys = env.physics
    env.reset()
    phys.set_state(state)
    env.step(action)
    obs1 = phys.render()
    obs2 = phys.render()
    h = lambda img : hash(img.data.tobytes())
    print('>>>> should be equal', h(obs1), h(obs2))

create()
create()

If I run this once with e.g. python test.py , the results are

>>>> should be equal 8297706552909453184 8297706552909453184
>>>> should be equal 8297706552909453184 8297706552909453184

so far so good. But if I run it again, I get:

>>>> should be equal -922062236587031648 -922062236587031648
>>>> should be equal -922062236587031648 -922062236587031648

Any ideas what might be going on here? Thanks!

snailrowen1337 commented 3 years ago

Python hashing does not seem to be consistent across processes. This is resolved!