google-research / planet

Learning Latent Dynamics for Planning from Pixels
https://danijar.com/planet
Apache License 2.0
1.18k stars 202 forks source link

Cannot make context current on thread #5

Closed danijar closed 5 years ago

danijar commented 5 years ago

@astronautas I'm starting a new thread for this to keep things separated. Thank you for looking more into this! To recap, the error message you reported comes from dm_control's renderer:

RuntimeError: Cannot make context <dm_control._render.glfw_renderer.GLFWContext object at 0x7f7b417a0550> current on thread <_DummyThread(Dummy-5, started daemon 140166375134976)>: this context is already current on another thread <_DummyThread(Dummy-4, started daemon 140166358349568)>.

Here are a few suggestions of what to try:

Note that TensorFlow has only one process but that process has a thread pool. When data gets collected, we call tf.py_func() to step the environment. This will be called by any of the threads, depending on where TensorFlow decided to schedule the operation.

astronautas commented 5 years ago

@danijar

  1. I tried to comment-out the ExternalProcess class and use GLFW, EGL and Mesa renderings. I receive the same error with either of those. I am attaching a log excerpt:
Error message
INFO:tensorflow:
--------------------------------------------------
Epoch 1 phase train (phase step 0, global step 0).
2019-03-18 23:02:01.348377: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 783.09MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
step/score/loss/zs_entropy/zs_divergence =  [0, -nan, 11821.6729, 35.320507, 2.84842563]
2019-03-18 23:02:02.203470: W tensorflow/core/common_runtime/bfc_allocator.cc:211] Allocator (GPU_0_bfc) ran out of memory trying to allocate 783.09MiB. The caller indicates that this is not a failure, but may mean that there could be performance gains if more memory were available.
step/score/loss/zs_entropy/zs_divergence =  [15, -nan, 11833.0537, 35.260334, 3.1794312]
2019-03-18 23:02:05.863356: W tensorflow/core/framework/op_kernel.cc:1261] Unknown: exceptions.RuntimeError: Cannot make context  current on thread <_DummyThread(Dummy-5, started daemon 140200046999296)>: this context is already current on another thread <_DummyThread(Dummy-4, started daemon 140200038606592)>.
Traceback (most recent call last):

  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/site-packages/tensorflow/python/ops/script_ops.py", line 206, in __call__
    ret = func(*args)

  File "planet/control/in_graph_batch_env.py", line 95, in 
    lambda a: self._batch_env.step(a)[:3], [action],

  File "planet/control/batch_env.py", line 86, in step
    for env, action in zip(self._envs, actions)]

  File "planet/control/wrappers.py", line 90, in step
    obs, reward, done, info = self._env.step(action)

  File "planet/control/wrappers.py", line 367, in step
    transition = self._env.step(action, *args, **kwargs)

  File "planet/control/wrappers.py", line 445, in step
    observ, reward, done, info = self._env.step(action)

  File "planet/control/wrappers.py", line 156, in step
    obs[self._key] = self._render_image()

  File "planet/control/wrappers.py", line 165, in _render_image
    image = self._env.render('rgb_array')

  File "planet/control/wrappers.py", line 261, in render
    *self._render_size, camera_id=self._camera_id)

  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/site-packages/dm_control/mujoco/engine.py", line 171, in render
    physics=self, height=height, width=width, camera_id=camera_id)

  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/site-packages/dm_control/mujoco/engine.py", line 574, in __init__
    with self._physics.contexts.gl.make_current() as ctx:

  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()

  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/site-packages/dm_control/_render/base.py", line 116, in make_current
    _CURRENT_THREAD_FOR_CONTEXT[id(self)]))

RuntimeError: Cannot make context  current on thread <_DummyThread(Dummy-5, started daemon 140200046999296)>: this context is already current on another thread <_DummyThread(Dummy-4, started daemon 140200038606592)>.

WARNING:tensorflow:Worker 006d3f0c-93c1-4a10-aee4-32ab0c8b125d run 00001: Exception:
Traceback (most recent call last):
  File "planet/training/running.py", line 199, in __iter__
    for value in self._process_fn(self._logdir, *args):
  File "/home/lukas/workspace/planet_src/planet/scripts/train.py", line 91, in process
    training.define_model, dataset, logdir, config):
  File "planet/training/utility.py", line 179, in train
    for score in trainer.iterate(config.max_steps):
  File "planet/training/trainer.py", line 201, in iterate
    summary, mean_score, global_step = sess.run(phase.op, phase.feed)
  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 929, in run
    run_metadata_ptr)
  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1152, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1328, in _do_run
    run_metadata)
  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/site-packages/tensorflow/python/client/session.py", line 1348, in _do_call
    raise type(e)(node_def, op, message)
UnknownError: exceptions.RuntimeError: Cannot make context  current on thread <_DummyThread(Dummy-5, started daemon 140200046999296)>: this context is already current on another thread <_DummyThread(Dummy-4, started daemon 140200038606592)>.
Traceback (most recent call last):

  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/site-packages/tensorflow/python/ops/script_ops.py", line 206, in __call__
    ret = func(*args)

  File "planet/control/in_graph_batch_env.py", line 95, in 
    lambda a: self._batch_env.step(a)[:3], [action],

  File "planet/control/batch_env.py", line 86, in step
    for env, action in zip(self._envs, actions)]

  File "planet/control/wrappers.py", line 90, in step
    obs, reward, done, info = self._env.step(action)

  File "planet/control/wrappers.py", line 367, in step
    transition = self._env.step(action, *args, **kwargs)

  File "planet/control/wrappers.py", line 445, in step
    observ, reward, done, info = self._env.step(action)

  File "planet/control/wrappers.py", line 156, in step
    obs[self._key] = self._render_image()

  File "planet/control/wrappers.py", line 165, in _render_image
    image = self._env.render('rgb_array')

  File "planet/control/wrappers.py", line 261, in render
    *self._render_size, camera_id=self._camera_id)

  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/site-packages/dm_control/mujoco/engine.py", line 171, in render
    physics=self, height=height, width=width, camera_id=camera_id)

  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/site-packages/dm_control/mujoco/engine.py", line 574, in __init__
    with self._physics.contexts.gl.make_current() as ctx:

  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()

  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/site-packages/dm_control/_render/base.py", line 116, in make_current
    _CURRENT_THREAD_FOR_CONTEXT[id(self)]))

RuntimeError: Cannot make context  current on thread <_DummyThread(Dummy-5, started daemon 140200046999296)>: this context is already current on another thread <_DummyThread(Dummy-4, started daemon 140200038606592)>.

     [[node graph/collection/should_collect_cheetah_run/simulate-1/train-cheetah_run-cem-12/scan/while/simulate/environment/simulate/step (defined at planet/control/in_graph_batch_env.py:96)  = PyFunc[Tin=[DT_FLOAT], Tout=[DT_UINT8, DT_FLOAT, DT_BOOL], token="pyfunc_7", _device="/job:localhost/replica:0/task:0/device:CPU:0"](graph/collection/should_collect_cheetah_run/simulate-1/train-cheetah_run-cem-12/scan/while/simulate/Identity_5/_847)]]
     [[{{node GroupCrossDeviceControlEdges_0/graph/collection/should_collect_cheetah_run/simulate-1/train-cheetah_run-cem-12/scan/while/simulate/environment/simulate/group_deps/_874}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_10532...group_deps", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](^_cloopgraph/collection/should_collect_cheetah_run/simulate-1/train-cheetah_run-cem-12/scan/while/Equal/_27)]]

Caused by op u'graph/collection/should_collect_cheetah_run/simulate-1/train-cheetah_run-cem-12/scan/while/simulate/environment/simulate/step', defined at:
  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/home/lukas/workspace/planet_src/planet/scripts/train.py", line 133, in 
    tf.app.run(lambda _: main(args_), remaining)
  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "/home/lukas/workspace/planet_src/planet/scripts/train.py", line 133, in 
    tf.app.run(lambda _: main(args_), remaining)
  File "/home/lukas/workspace/planet_src/planet/scripts/train.py", line 106, in main
    for unused_score in run:
  File "planet/training/running.py", line 199, in __iter__
    for value in self._process_fn(self._logdir, *args):
  File "/home/lukas/workspace/planet_src/planet/scripts/train.py", line 91, in process
    training.define_model, dataset, logdir, config):
  File "planet/training/utility.py", line 160, in train
    score, summary = model_fn(data, trainer, config)
  File "planet/training/define_model.py", line 133, in define_model
    name='should_collect_' + params.task.name)
  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2086, in cond
    orig_res_t, res_t = context_t.BuildCondBranch(true_fn)
  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 1930, in BuildCondBranch
    original_result = fn()
  File "planet/training/utility.py", line 254, in simulate_episodes
    1, agent_config, name=name)
  File "planet/control/simulate.py", line 42, in simulate
    env_processes=env_processes)
  File "planet/control/simulate.py", line 78, in collect_rollouts
    initializer, parallel_iterations=1)
  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/site-packages/tensorflow/python/ops/functional_ops.py", line 718, in scan
    maximum_iterations=n)
  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 3291, in while_loop
    return_same_structure)
  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 3004, in BuildLoop
    pred, body, original_loop_vars, loop_vars, shape_invariants)
  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2939, in _BuildLoop
    body_result = body(*packed_vars_for_body)
  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 3260, in 
    body = lambda i, lv: (i + 1, orig_body(*lv))
  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/site-packages/tensorflow/python/ops/functional_ops.py", line 697, in compute
    a_out = fn(packed_a, packed_elems)
  File "planet/control/simulate.py", line 63, in simulate_fn
    reset=tf.equal(step, 0))
  File "planet/control/simulate.py", line 219, in simulate_step
    step, score, length = _define_step()
  File "planet/control/simulate.py", line 150, in _define_step
    with tf.control_dependencies([batch_env.step(action)]):
  File "planet/control/in_graph_batch_env.py", line 96, in step
    [observ_dtype, tf.float32, tf.bool], name='step')
  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/site-packages/tensorflow/python/ops/script_ops.py", line 457, in py_func
    func=func, inp=inp, Tout=Tout, stateful=stateful, eager=False, name=name)
  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/site-packages/tensorflow/python/ops/script_ops.py", line 281, in _internal_py_func
    input=inp, token=token, Tout=Tout, name=name)
  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/site-packages/tensorflow/python/ops/gen_script_ops.py", line 129, in py_func
    "PyFunc", input=input, token=token, Tout=Tout, name=name)
  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
    op_def=op_def)
  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1770, in __init__
    self._traceback = tf_stack.extract_stack()

UnknownError (see above for traceback): exceptions.RuntimeError: Cannot make context  current on thread <_DummyThread(Dummy-5, started daemon 140200046999296)>: this context is already current on another thread <_DummyThread(Dummy-4, started daemon 140200038606592)>.
Traceback (most recent call last):

  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/site-packages/tensorflow/python/ops/script_ops.py", line 206, in __call__
    ret = func(*args)

  File "planet/control/in_graph_batch_env.py", line 95, in 
    lambda a: self._batch_env.step(a)[:3], [action],

  File "planet/control/batch_env.py", line 86, in step
    for env, action in zip(self._envs, actions)]

  File "planet/control/wrappers.py", line 90, in step
    obs, reward, done, info = self._env.step(action)

  File "planet/control/wrappers.py", line 367, in step
    transition = self._env.step(action, *args, **kwargs)

  File "planet/control/wrappers.py", line 445, in step
    observ, reward, done, info = self._env.step(action)

  File "planet/control/wrappers.py", line 156, in step
    obs[self._key] = self._render_image()

  File "planet/control/wrappers.py", line 165, in _render_image
    image = self._env.render('rgb_array')

  File "planet/control/wrappers.py", line 261, in render
    *self._render_size, camera_id=self._camera_id)

  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/site-packages/dm_control/mujoco/engine.py", line 171, in render
    physics=self, height=height, width=width, camera_id=camera_id)

  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/site-packages/dm_control/mujoco/engine.py", line 574, in __init__
    with self._physics.contexts.gl.make_current() as ctx:

  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()

  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/site-packages/dm_control/_render/base.py", line 116, in make_current
    _CURRENT_THREAD_FOR_CONTEXT[id(self)]))

RuntimeError: Cannot make context  current on thread <_DummyThread(Dummy-5, started daemon 140200046999296)>: this context is already current on another thread <_DummyThread(Dummy-4, started daemon 140200038606592)>.

     [[node graph/collection/should_collect_cheetah_run/simulate-1/train-cheetah_run-cem-12/scan/while/simulate/environment/simulate/step (defined at planet/control/in_graph_batch_env.py:96)  = PyFunc[Tin=[DT_FLOAT], Tout=[DT_UINT8, DT_FLOAT, DT_BOOL], token="pyfunc_7", _device="/job:localhost/replica:0/task:0/device:CPU:0"](graph/collection/should_collect_cheetah_run/simulate-1/train-cheetah_run-cem-12/scan/while/simulate/Identity_5/_847)]]
     [[{{node GroupCrossDeviceControlEdges_0/graph/collection/should_collect_cheetah_run/simulate-1/train-cheetah_run-cem-12/scan/while/simulate/environment/simulate/group_deps/_874}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_10532...group_deps", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](^_cloopgraph/collection/should_collect_cheetah_run/simulate-1/train-cheetah_run-cem-12/scan/while/Equal/_27)]]

WARNING:tensorflow:Worker 006d3f0c-93c1-4a10-aee4-32ab0c8b125d run 00001: Failed.
Traceback (most recent call last):
  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/home/lukas/workspace/planet_src/planet/scripts/train.py", line 133, in 
    tf.app.run(lambda _: main(args_), remaining)
  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "/home/lukas/workspace/planet_src/planet/scripts/train.py", line 133, in 
    tf.app.run(lambda _: main(args_), remaining)
  File "/home/lukas/workspace/planet_src/planet/scripts/train.py", line 106, in main
    for unused_score in run:
  File "planet/training/running.py", line 210, in __iter__
    raise e
tensorflow.python.framework.errors_impl.UnknownError: exceptions.RuntimeError: Cannot make context  current on thread <_DummyThread(Dummy-5, started daemon 140200046999296)>: this context is already current on another thread <_DummyThread(Dummy-4, started daemon 140200038606592)>.
Traceback (most recent call last):

  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/site-packages/tensorflow/python/ops/script_ops.py", line 206, in __call__
    ret = func(*args)

  File "planet/control/in_graph_batch_env.py", line 95, in 
    lambda a: self._batch_env.step(a)[:3], [action],

  File "planet/control/batch_env.py", line 86, in step
    for env, action in zip(self._envs, actions)]

  File "planet/control/wrappers.py", line 90, in step
    obs, reward, done, info = self._env.step(action)

  File "planet/control/wrappers.py", line 367, in step
    transition = self._env.step(action, *args, **kwargs)

  File "planet/control/wrappers.py", line 445, in step
    observ, reward, done, info = self._env.step(action)

  File "planet/control/wrappers.py", line 156, in step
    obs[self._key] = self._render_image()

  File "planet/control/wrappers.py", line 165, in _render_image
    image = self._env.render('rgb_array')

  File "planet/control/wrappers.py", line 261, in render
    *self._render_size, camera_id=self._camera_id)

  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/site-packages/dm_control/mujoco/engine.py", line 171, in render
    physics=self, height=height, width=width, camera_id=camera_id)

  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/site-packages/dm_control/mujoco/engine.py", line 574, in __init__
    with self._physics.contexts.gl.make_current() as ctx:

  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()

  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/site-packages/dm_control/_render/base.py", line 116, in make_current
    _CURRENT_THREAD_FOR_CONTEXT[id(self)]))

RuntimeError: Cannot make context  current on thread <_DummyThread(Dummy-5, started daemon 140200046999296)>: this context is already current on another thread <_DummyThread(Dummy-4, started daemon 140200038606592)>.

     [[node graph/collection/should_collect_cheetah_run/simulate-1/train-cheetah_run-cem-12/scan/while/simulate/environment/simulate/step (defined at planet/control/in_graph_batch_env.py:96)  = PyFunc[Tin=[DT_FLOAT], Tout=[DT_UINT8, DT_FLOAT, DT_BOOL], token="pyfunc_7", _device="/job:localhost/replica:0/task:0/device:CPU:0"](graph/collection/should_collect_cheetah_run/simulate-1/train-cheetah_run-cem-12/scan/while/simulate/Identity_5/_847)]]
     [[{{node GroupCrossDeviceControlEdges_0/graph/collection/should_collect_cheetah_run/simulate-1/train-cheetah_run-cem-12/scan/while/simulate/environment/simulate/group_deps/_874}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_10532...group_deps", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](^_cloopgraph/collection/should_collect_cheetah_run/simulate-1/train-cheetah_run-cem-12/scan/while/Equal/_27)]]

Caused by op u'graph/collection/should_collect_cheetah_run/simulate-1/train-cheetah_run-cem-12/scan/while/simulate/environment/simulate/step', defined at:
  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/runpy.py", line 174, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/home/lukas/workspace/planet_src/planet/scripts/train.py", line 133, in 
    tf.app.run(lambda _: main(args_), remaining)
  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 125, in run
    _sys.exit(main(argv))
  File "/home/lukas/workspace/planet_src/planet/scripts/train.py", line 133, in 
    tf.app.run(lambda _: main(args_), remaining)
  File "/home/lukas/workspace/planet_src/planet/scripts/train.py", line 106, in main
    for unused_score in run:
  File "planet/training/running.py", line 199, in __iter__
    for value in self._process_fn(self._logdir, *args):
  File "/home/lukas/workspace/planet_src/planet/scripts/train.py", line 91, in process
    training.define_model, dataset, logdir, config):
  File "planet/training/utility.py", line 160, in train
    score, summary = model_fn(data, trainer, config)
  File "planet/training/define_model.py", line 133, in define_model
    name='should_collect_' + params.task.name)
  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2086, in cond
    orig_res_t, res_t = context_t.BuildCondBranch(true_fn)
  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 1930, in BuildCondBranch
    original_result = fn()
  File "planet/training/utility.py", line 254, in simulate_episodes
    1, agent_config, name=name)
  File "planet/control/simulate.py", line 42, in simulate
    env_processes=env_processes)
  File "planet/control/simulate.py", line 78, in collect_rollouts
    initializer, parallel_iterations=1)
  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/site-packages/tensorflow/python/ops/functional_ops.py", line 718, in scan
    maximum_iterations=n)
  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 3291, in while_loop
    return_same_structure)
  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 3004, in BuildLoop
    pred, body, original_loop_vars, loop_vars, shape_invariants)
  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 2939, in _BuildLoop
    body_result = body(*packed_vars_for_body)
  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/site-packages/tensorflow/python/ops/control_flow_ops.py", line 3260, in 
    body = lambda i, lv: (i + 1, orig_body(*lv))
  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/site-packages/tensorflow/python/ops/functional_ops.py", line 697, in compute
    a_out = fn(packed_a, packed_elems)
  File "planet/control/simulate.py", line 63, in simulate_fn
    reset=tf.equal(step, 0))
  File "planet/control/simulate.py", line 219, in simulate_step
    step, score, length = _define_step()
  File "planet/control/simulate.py", line 150, in _define_step
    with tf.control_dependencies([batch_env.step(action)]):
  File "planet/control/in_graph_batch_env.py", line 96, in step
    [observ_dtype, tf.float32, tf.bool], name='step')
  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/site-packages/tensorflow/python/ops/script_ops.py", line 457, in py_func
    func=func, inp=inp, Tout=Tout, stateful=stateful, eager=False, name=name)
  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/site-packages/tensorflow/python/ops/script_ops.py", line 281, in _internal_py_func
    input=inp, token=token, Tout=Tout, name=name)
  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/site-packages/tensorflow/python/ops/gen_script_ops.py", line 129, in py_func
    "PyFunc", input=input, token=token, Tout=Tout, name=name)
  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/site-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
    op_def=op_def)
  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/site-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
    return func(*args, **kwargs)
  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 3274, in create_op
    op_def=op_def)
  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/site-packages/tensorflow/python/framework/ops.py", line 1770, in __init__
    self._traceback = tf_stack.extract_stack()

UnknownError (see above for traceback): exceptions.RuntimeError: Cannot make context  current on thread <_DummyThread(Dummy-5, started daemon 140200046999296)>: this context is already current on another thread <_DummyThread(Dummy-4, started daemon 140200038606592)>.
Traceback (most recent call last):

  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/site-packages/tensorflow/python/ops/script_ops.py", line 206, in __call__
    ret = func(*args)

  File "planet/control/in_graph_batch_env.py", line 95, in 
    lambda a: self._batch_env.step(a)[:3], [action],

  File "planet/control/batch_env.py", line 86, in step
    for env, action in zip(self._envs, actions)]

  File "planet/control/wrappers.py", line 90, in step
    obs, reward, done, info = self._env.step(action)

  File "planet/control/wrappers.py", line 367, in step
    transition = self._env.step(action, *args, **kwargs)

  File "planet/control/wrappers.py", line 445, in step
    observ, reward, done, info = self._env.step(action)

  File "planet/control/wrappers.py", line 156, in step
    obs[self._key] = self._render_image()

  File "planet/control/wrappers.py", line 165, in _render_image
    image = self._env.render('rgb_array')

  File "planet/control/wrappers.py", line 261, in render
    *self._render_size, camera_id=self._camera_id)

  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/site-packages/dm_control/mujoco/engine.py", line 171, in render
    physics=self, height=height, width=width, camera_id=camera_id)

  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/site-packages/dm_control/mujoco/engine.py", line 574, in __init__
    with self._physics.contexts.gl.make_current() as ctx:

  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/contextlib.py", line 17, in __enter__
    return self.gen.next()

  File "/home/lukas/miniconda3/envs/planet/lib/python2.7/site-packages/dm_control/_render/base.py", line 116, in make_current
    _CURRENT_THREAD_FOR_CONTEXT[id(self)]))

RuntimeError: Cannot make context  current on thread <_DummyThread(Dummy-5, started daemon 140200046999296)>: this context is already current on another thread <_DummyThread(Dummy-4, started daemon 140200038606592)>.

     [[node graph/collection/should_collect_cheetah_run/simulate-1/train-cheetah_run-cem-12/scan/while/simulate/environment/simulate/step (defined at planet/control/in_graph_batch_env.py:96)  = PyFunc[Tin=[DT_FLOAT], Tout=[DT_UINT8, DT_FLOAT, DT_BOOL], token="pyfunc_7", _device="/job:localhost/replica:0/task:0/device:CPU:0"](graph/collection/should_collect_cheetah_run/simulate-1/train-cheetah_run-cem-12/scan/while/simulate/Identity_5/_847)]]
     [[{{node GroupCrossDeviceControlEdges_0/graph/collection/should_collect_cheetah_run/simulate-1/train-cheetah_run-cem-12/scan/while/simulate/environment/simulate/group_deps/_874}} = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/device:GPU:0", send_device="/job:localhost/replica:0/task:0/device:CPU:0", send_device_incarnation=1, tensor_name="edge_10532...group_deps", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/device:GPU:0"](^_cloopgraph/collection/should_collect_cheetah_run/simulate-1/train-cheetah_run-cem-12/scan/while/Equal/_27)]]
  1. Wrapping the environment with DeepMindWrapper works as expected. EDIT: I've managed to wrap the env with ExternalProcess and make it work. That's weird, I have no idea why it works now :man_shrugging:. The only thing I changed was multiprocessing to multiprocess library, as there were method pickling issues.

Please, @danijar, could you look again into the source code and #6 issue. Note my last comment - there might be something wrong with "reset" and "close" message order. Super important - I tried to call render directly on ExternalProcess, which started to yield similar to #6 issue errors. Maybe that's gonna be of some help.

Here's the source code:

from dm_control import suite
import numpy as np
from dm_control import viewer
from threading import Thread
import cv2
from planet.control.wrappers import ExternalProcess, ActionRepeat, LimitDuration, PixelObservations, ConvertTo32Bit
from planet.control.wrappers import DeepMindWrapper

def rewards(env):
  # Step through an episode and print out reward, discount and observation.
  action_spec = env.action_spec()
  time_step = env.reset()

  while True:
    action = np.random.uniform(action_spec.minimum,
                              action_spec.maximum,
                              size=action_spec.shape)

    time_step = env.step(action)
    img = env.physics.render()

    cv2.imshow("img", img)
    cv2.waitKey(0)

# Load one task:
# env = suite.load(domain_name="cartpole", task_name="swingup")
def env_ctor():
  env = DeepMindWrapper(suite.load("cartpole", "swingup"), (64, 64))
  env = ActionRepeat(env, 2)
  # env = LimitDuration(env, 1000)
  env = PixelObservations(env, (64, 64), np.uint8, 'image')
  env = ConvertTo32Bit(env)

  return env

env = ExternalProcess(env_ctor)

# Iterate over a task set:
# for domain_name, task_name in suite.BENCHMARKING:
#   env = suite.load(domain_name, task_name)

action_spec = env.action_space
time_step = env.reset()

while True:
  action = action_spec.sample()

  time_step = env.step(action)
  img = env.call("render")()

  cv2.imshow("img", img)
  cv2.waitKey(1)

# viewer.launch(env)
danijar commented 5 years ago

@astronautas Are your results for different dm_control rendering options consistent with https://github.com/google-research/planet/issues/6#issuecomment-474493971? The error message you attached seems to be specific to GLFW.

In your example script, you're trying to call render on the physics object directly. I doubt this will work since the ExternalProcess wrapper would try to pickle the physics object to send it over to the main process. Instead, can you look at the observations returned by the environment? This way, the images will be rendered by the PixelObservations wrapper within the same process as the environment lives.

It's not impossible that the problem lies in the reset message. However, it works well for many other people including myself. I'm not sure how a race condition could occur since the env worker just pulls one message after another from the pipe. That being said, please feel free to look at the external process wrapper and see if you find a problem -- the code is quite simple. I just did and I didn't see a problem with it.

danijar commented 5 years ago

Hi @astronautas, @JamesLuoau, I've updated the dependency section of the readme to list precise versions and compatible rendering options. Could you please verify if you still have a problem running the code under the specified setup?

astronautas commented 5 years ago

@danijar I'll reinstall all the dependencies in a new conda environment and verify whether it succeeds. I have a feeling that this could work on Ubuntu 18.04. Multiple users that got this working under 18.04. I have 16.04, so there might be some issues with that.

EDIT: still cannot get it working under 16.04 with all the correct dependency version. One thing - I was not able to install dm_control via setup.py (no PyPi package of that). I installed it from their github repository (as instructed in README.md). Could you also specify dm_control version?

astronautas commented 5 years ago

@danijar Solved this one, thanks again.

danijar commented 5 years ago

Thanks awesome! What did you do to make it work?

astronautas commented 5 years ago

It's the same solution as to "Connection reset by peer" problem. I upgraded Ubuntu from 16.04 to 18.04, as well as installed latest Nvidia drivers (418). Using the EGL rendering, it does run and quite well!

So, I'm not sure what exactly helped but this could be a heuristic for other people to reproduce the code :)

danijar commented 5 years ago

Thanks for letting others know and nice that it's working now!