Eclectic-Sheep / sheeprl

Distributed Reinforcement Learning accelerated by Lightning Fabric
https://eclecticsheep.ai
Apache License 2.0
275 stars 26 forks source link

Error when running MUJOCO and DMC #176

Closed LYK-love closed 6 months ago

LYK-love commented 6 months ago

My env:

  1. Ubuntu22.04, x86_64
  2. Python3.10
  3. Installed dependencies pip install .
  4. Installed libglew2.2 and set MUJOCO_GL=egl
  5. Installed pip install PyOpenGL-accelerate
  6. Installed pip install -e .[dmc]

The command I tried to run:

python sheeprl.py exp=dreamer_v3 env=gym env.id=Walker2d-v4 algo.cnn_keys.encoder=[rgb] 

Output:

<....>
Seed set to 42
Process Worker<AsyncVectorEnv>-0:
Traceback (most recent call last):
  File "/home/lyk/.conda/envs/sheeprl/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/home/lyk/.conda/envs/sheeprl/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/lyk/.conda/envs/sheeprl/lib/python3.10/site-packages/gymnasium/vector/async_vector_env.py", line 621, in _worker_shared_memory
    env = env_fn()
  File "/home/lyk/.conda/envs/sheeprl/lib/python3.10/site-packages/gymnasium/vector/utils/misc.py", line 35, in __call__
    return self.fn()
  File "/home/lyk/Projects/sheeprl/sheeprl/envs/wrappers.py", line 83, in __init__
    super().__init__(self._env_fn())
  File "/home/lyk/Projects/sheeprl/sheeprl/utils/env.py", line 111, in thunk
    env = gym.wrappers.PixelObservationWrapper(
  File "/home/lyk/.conda/envs/sheeprl/lib/python3.10/site-packages/gymnasium/wrappers/pixel_observation.py", line 146, in __init__
    pixels = self._render(**render_kwargs[pixel_key])
  File "/home/lyk/.conda/envs/sheeprl/lib/python3.10/site-packages/gymnasium/wrappers/pixel_observation.py", line 212, in _render
    render = self.env.render(*args, **kwargs)
  File "/home/lyk/.conda/envs/sheeprl/lib/python3.10/site-packages/gymnasium/core.py", line 471, in render
    return self.env.render()
  File "/home/lyk/.conda/envs/sheeprl/lib/python3.10/site-packages/gymnasium/wrappers/order_enforcing.py", line 70, in render
    return self.env.render(*args, **kwargs)
  File "/home/lyk/.conda/envs/sheeprl/lib/python3.10/site-packages/gymnasium/wrappers/env_checker.py", line 65, in render
    return env_render_passive_checker(self.env, *args, **kwargs)
  File "/home/lyk/.conda/envs/sheeprl/lib/python3.10/site-packages/gymnasium/utils/passive_env_checker.py", line 362, in env_render_passive_checker
    result = env.render()
  File "/home/lyk/.conda/envs/sheeprl/lib/python3.10/site-packages/gymnasium/envs/mujoco/mujoco_env.py", line 409, in render
    return self.mujoco_renderer.render(
  File "/home/lyk/.conda/envs/sheeprl/lib/python3.10/site-packages/gymnasium/envs/mujoco/mujoco_rendering.py", line 646, in render
    viewer = self._get_viewer(render_mode=render_mode)
  File "/home/lyk/.conda/envs/sheeprl/lib/python3.10/site-packages/gymnasium/envs/mujoco/mujoco_rendering.py", line 686, in _get_viewer
    self.viewer = OffScreenViewer(self.model, self.data)
  File "/home/lyk/.conda/envs/sheeprl/lib/python3.10/site-packages/gymnasium/envs/mujoco/mujoco_rendering.py", line 144, in __init__
    super().__init__(model, data, width, height)
  File "/home/lyk/.conda/envs/sheeprl/lib/python3.10/site-packages/gymnasium/envs/mujoco/mujoco_rendering.py", line 61, in __init__
    self.con = mujoco.MjrContext(self.model, mujoco.mjtFontScale.mjFONTSCALE_150)
mujoco.FatalError: Offscreen framebuffer is not complete, error 0x8cdd
Process Worker<AsyncVectorEnv>-1:
Traceback (most recent call last):
  File "/home/lyk/.conda/envs/sheeprl/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/home/lyk/.conda/envs/sheeprl/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/lyk/.conda/envs/sheeprl/lib/python3.10/site-packages/gymnasium/vector/async_vector_env.py", line 621, in _worker_shared_memory
    env = env_fn()
  File "/home/lyk/.conda/envs/sheeprl/lib/python3.10/site-packages/gymnasium/vector/utils/misc.py", line 35, in __call__
    return self.fn()
  File "/home/lyk/Projects/sheeprl/sheeprl/envs/wrappers.py", line 83, in __init__
    super().__init__(self._env_fn())
  File "/home/lyk/Projects/sheeprl/sheeprl/utils/env.py", line 111, in thunk
    env = gym.wrappers.PixelObservationWrapper(
  File "/home/lyk/.conda/envs/sheeprl/lib/python3.10/site-packages/gymnasium/wrappers/pixel_observation.py", line 146, in __init__
    pixels = self._render(**render_kwargs[pixel_key])
  File "/home/lyk/.conda/envs/sheeprl/lib/python3.10/site-packages/gymnasium/wrappers/pixel_observation.py", line 212, in _render
    render = self.env.render(*args, **kwargs)
  File "/home/lyk/.conda/envs/sheeprl/lib/python3.10/site-packages/gymnasium/core.py", line 471, in render
    return self.env.render()
  File "/home/lyk/.conda/envs/sheeprl/lib/python3.10/site-packages/gymnasium/wrappers/order_enforcing.py", line 70, in render
    return self.env.render(*args, **kwargs)
  File "/home/lyk/.conda/envs/sheeprl/lib/python3.10/site-packages/gymnasium/wrappers/env_checker.py", line 65, in render
    return env_render_passive_checker(self.env, *args, **kwargs)
  File "/home/lyk/.conda/envs/sheeprl/lib/python3.10/site-packages/gymnasium/utils/passive_env_checker.py", line 362, in env_render_passive_checker
    result = env.render()
  File "/home/lyk/.conda/envs/sheeprl/lib/python3.10/site-packages/gymnasium/envs/mujoco/mujoco_env.py", line 409, in render
    return self.mujoco_renderer.render(
  File "/home/lyk/.conda/envs/sheeprl/lib/python3.10/site-packages/gymnasium/envs/mujoco/mujoco_rendering.py", line 646, in render
    viewer = self._get_viewer(render_mode=render_mode)
  File "/home/lyk/.conda/envs/sheeprl/lib/python3.10/site-packages/gymnasium/envs/mujoco/mujoco_rendering.py", line 686, in _get_viewer
    self.viewer = OffScreenViewer(self.model, self.data)
  File "/home/lyk/.conda/envs/sheeprl/lib/python3.10/site-packages/gymnasium/envs/mujoco/mujoco_rendering.py", line 144, in __init__
    super().__init__(model, data, width, height)
  File "/home/lyk/.conda/envs/sheeprl/lib/python3.10/site-packages/gymnasium/envs/mujoco/mujoco_rendering.py", line 61, in __init__
    self.con = mujoco.MjrContext(self.model, mujoco.mjtFontScale.mjFONTSCALE_150)
mujoco.FatalError: Offscreen framebuffer is not complete, error 0x8cdd
Process Worker<AsyncVectorEnv>-2:
<....>

I also ran

python sheeprl.py exp=dreamer_v3 env=dmc env.id=walker_walk algo.cnn_keys.encoder=[rgb]

But got following error:

Seed set to 42
[2023-12-21 01:08:52,489][absl][INFO] - MUJOCO_GL=egl, attempting to import specified OpenGL backend.
[2023-12-21 01:08:52,558][absl][INFO] - MuJoCo library version is: 3.1.1
/home/lyk/.conda/envs/sheeprl/lib/python3.10/site-packages/gymnasium/experimental/wrappers/rendering.py:166: UserWarning: WARN: Overwriting existing videos at /home/lyk/Projects/sheeprl/logs/runs/dreamer_v3/walker_walk/2023-12-21_01-08-52_dreamer_v3_walker_walk_42/version_0/train_videos folder (try specifying a different `video_folder` for the `RecordVideo` wrapper if this is not desired)
  logger.warn(
Encoder CNN keys: ['rgb']
Encoder MLP keys: []
Decoder CNN keys: ['rgb']
Decoder MLP keys: []
/home/lyk/Projects/sheeprl/sheeprl/envs/wrappers.py:116: UserWarning: WARN: RESET - Restarting env after crash with EGLError: EGLError(
    err = EGL_BAD_ALLOC,
    baseOperation = eglCreateContext,
    cArguments = (
        <OpenGL._opaque.EGLDisplay_pointer object at 0x7fdd2bf90340>,
        <OpenGL._opaque.EGLConfig_pointer object at 0x7fdd20e564c0>,
        <OpenGL._opaque.EGLContext_pointer object at 0x7fdd2bcf23c0>,
        None,
    ),
    result = <OpenGL._opaque.EGLContext_pointer object at 0x7fdd20e56940>
)
  gym.logger.warn(f"RESET - Restarting env after crash with {type(e).__name__}: {e}")
/home/lyk/Projects/sheeprl/sheeprl/envs/wrappers.py:116: UserWarning: WARN: RESET - Restarting env after crash with EGLError: EGLError(
    err = EGL_BAD_ALLOC,
    baseOperation = eglCreateContext,
    cArguments = (
        <OpenGL._opaque.EGLDisplay_pointer object at 0x7fdd2bf90340>,
        <OpenGL._opaque.EGLConfig_pointer object at 0x7fdd20e554c0>,
        <OpenGL._opaque.EGLContext_pointer object at 0x7fdd2bcf23c0>,
        None,
    ),
    result = <OpenGL._opaque.EGLContext_pointer object at 0x7fdd20e558c0>
)
  gym.logger.warn(f"RESET - Restarting env after crash with {type(e).__name__}: {e}")
/home/lyk/Projects/sheeprl/sheeprl/envs/wrappers.py:116: UserWarning: WARN: RESET - Restarting env after crash with EGLError: EGLError(
    err = EGL_BAD_ALLOC,
    baseOperation = eglCreateContext,
    cArguments = (
        <OpenGL._opaque.EGLDisplay_pointer object at 0x7fdd2bf90340>,
        <OpenGL._opaque.EGLConfig_pointer object at 0x7fdd20e554c0>,
        <OpenGL._opaque.EGLContext_pointer object at 0x7fdd2bcf23c0>,
        None,
    ),
    result = <OpenGL._opaque.EGLContext_pointer object at 0x7fdd20e558c0>
)
  gym.logger.warn(f"RESET - Restarting env after crash with {type(e).__name__}: {e}")
/home/lyk/Projects/sheeprl/sheeprl/envs/wrappers.py:116: UserWarning: WARN: RESET - Restarting env after crash with EGLError: EGLError(
    err = EGL_BAD_ALLOC,
    baseOperation = eglCreateContext,
    cArguments = (
        <OpenGL._opaque.EGLDisplay_pointer object at 0x7fdd2bf90340>,
        <OpenGL._opaque.EGLConfig_pointer object at 0x7fdd20e554c0>,
        <OpenGL._opaque.EGLContext_pointer object at 0x7fdd2bcf23c0>,
        None,
    ),
    result = <OpenGL._opaque.EGLContext_pointer object at 0x7fdd20e558c0>
)
  gym.logger.warn(f"RESET - Restarting env after crash with {type(e).__name__}: {e}")
Exception ignored in: <function ContextBase.__del__ at 0x7fdd2bcde4d0>
Traceback (most recent call last):
  File "/home/lyk/.conda/envs/sheeprl/lib/python3.10/site-packages/dm_control/_render/base.py", line 118, in __del__
    self._free_unconditionally()
  File "/home/lyk/.conda/envs/sheeprl/lib/python3.10/site-packages/dm_control/_render/base.py", line 115, in _free_unconditionally
    self._render_executor.terminate(self._free_on_executor_thread)
  File "/home/lyk/.conda/envs/sheeprl/lib/python3.10/site-packages/dm_control/_render/executor/render_executor.py", line 144, in terminate
    cleanup_callable()
  File "/home/lyk/.conda/envs/sheeprl/lib/python3.10/site-packages/dm_control/_render/base.py", line 100, in _free_on_executor_thread
    while self._patients:
AttributeError: 'EGLContext' object has no attribute '_patients'
Exception ignored in: <function ContextBase.__del__ at 0x7fdd2bcde4d0>
Traceback (most recent call last):
  File "/home/lyk/.conda/envs/sheeprl/lib/python3.10/site-packages/dm_control/_render/base.py", line 118, in __del__
    self._free_unconditionally()
  File "/home/lyk/.conda/envs/sheeprl/lib/python3.10/site-packages/dm_control/_render/base.py", line 115, in _free_unconditionally
    self._render_executor.terminate(self._free_on_executor_thread)
  File "/home/lyk/.conda/envs/sheeprl/lib/python3.10/site-packages/dm_control/_render/executor/render_executor.py", line 144, in terminate
    cleanup_callable()
  File "/home/lyk/.conda/envs/sheeprl/lib/python3.10/site-packages/dm_control/_render/base.py", line 100, in _free_on_executor_thread
    while self._patients:
AttributeError: 'EGLContext' object has no attribute '_patients'
Traceback (most recent call last):
  File "/home/lyk/.conda/envs/sheeprl/lib/python3.10/multiprocessing/queues.py", line 244, in _feed
    obj = _ForkingPickler.dumps(obj)
  File "/home/lyk/.conda/envs/sheeprl/lib/python3.10/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
ValueError: ctypes objects containing pointers cannot be pickled
Exception ignored in: <function ContextBase.__del__ at 0x7fdd2bcde4d0>
Traceback (most recent call last):
  File "/home/lyk/.conda/envs/sheeprl/lib/python3.10/site-packages/dm_control/_render/base.py", line 118, in __del__
    self._free_unconditionally()
  File "/home/lyk/.conda/envs/sheeprl/lib/python3.10/site-packages/dm_control/_render/base.py", line 115, in _free_unconditionally
    self._render_executor.terminate(self._free_on_executor_thread)
  File "/home/lyk/.conda/envs/sheeprl/lib/python3.10/site-packages/dm_control/_render/executor/render_executor.py", line 144, in terminate
    cleanup_callable()
  File "/home/lyk/.conda/envs/sheeprl/lib/python3.10/site-packages/dm_control/_render/base.py", line 100, in _free_on_executor_thread
    while self._patients:
AttributeError: 'EGLContext' object has no attribute '_patients'
Traceback (most recent call last):
Exception ignored in: <function ContextBase.__del__ at 0x7fdd2bcde4d0>  File "/home/lyk/.conda/envs/sheeprl/lib/python3.10/multiprocessing/queues.py", line 244, in _feed
    obj = _ForkingPickler.dumps(obj)

  File "/home/lyk/.conda/envs/sheeprl/lib/python3.10/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
Traceback (most recent call last):
  File "/home/lyk/.conda/envs/sheeprl/lib/python3.10/site-packages/dm_control/_render/base.py", line 118, in __del__
ValueError: ctypes objects containing pointers cannot be pickled
Exception ignored in: <function ContextBase.__del__ at 0x7fdd2bcde4d0>
Traceback (most recent call last):
  File "/home/lyk/.conda/envs/sheeprl/lib/python3.10/site-packages/dm_control/_render/base.py", line 118, in __del__
    self._free_unconditionally()
  File "/home/lyk/.conda/envs/sheeprl/lib/python3.10/site-packages/dm_control/_render/base.py", line 115, in _free_unconditionally
    self._free_unconditionally()
  File "/home/lyk/.conda/envs/sheeprl/lib/python3.10/site-packages/dm_control/_render/base.py", line 115, in _free_unconditionally
    self._render_executor.terminate(self._free_on_executor_thread)
  File "/home/lyk/.conda/envs/sheeprl/lib/python3.10/site-packages/dm_control/_render/executor/render_executor.py", line 144, in terminate
    self._render_executor.terminate(self._free_on_executor_thread)
  File "/home/lyk/.conda/envs/sheeprl/lib/python3.10/site-packages/dm_control/_render/executor/render_executor.py", line 144, in terminate
    cleanup_callable()
  File "/home/lyk/.conda/envs/sheeprl/lib/python3.10/site-packages/dm_control/_render/base.py", line 100, in _free_on_executor_thread
    while self._patients:
AttributeError: 'EGLContext' object has no attribute '_patients'
    cleanup_callable()
  File "/home/lyk/.conda/envs/sheeprl/lib/python3.10/site-packages/dm_control/_render/base.py", line 100, in _free_on_executor_thread
    while self._patients:
AttributeError: 'EGLContext' object has no attribute '_patients'
Exception ignored in: <function ContextBase.__del__ at 0x7fdd2bcde4d0>
Traceback (most recent call last):
  File "/home/lyk/.conda/envs/sheeprl/lib/python3.10/site-packages/dm_control/_render/base.py", line 118, in __del__
    self._free_unconditionally()
  File "/home/lyk/.conda/envs/sheeprl/lib/python3.10/site-packages/dm_control/_render/base.py", line 115, in _free_unconditionally
    self._render_executor.terminate(self._free_on_executor_thread)
  File "/home/lyk/.conda/envs/sheeprl/lib/python3.10/site-packages/dm_control/_render/executor/render_executor.py", line 144, in terminate
    cleanup_callable()
  File "/home/lyk/.conda/envs/sheeprl/lib/python3.10/site-packages/dm_control/_render/base.py", line 100, in _free_on_executor_thread
    while self._patients:
AttributeError: 'EGLContext' object has no attribute '_patients'
Exception ignored in: <function ContextBase.__del__ at 0x7fdd2bcde4d0>
Traceback (most recent call last):
  File "/home/lyk/.conda/envs/sheeprl/lib/python3.10/site-packages/dm_control/_render/base.py", line 118, in __del__
    self._free_unconditionally()
  File "/home/lyk/.conda/envs/sheeprl/lib/python3.10/site-packages/dm_control/_render/base.py", line 115, in _free_unconditionally
    self._render_executor.terminate(self._free_on_executor_thread)
  File "/home/lyk/.conda/envs/sheeprl/lib/python3.10/site-packages/dm_control/_render/executor/render_executor.py", line 144, in terminate
    cleanup_callable()
  File "/home/lyk/.conda/envs/sheeprl/lib/python3.10/site-packages/dm_control/_render/base.py", line 100, in _free_on_executor_thread
Exception ignored in: <function ContextBase.__del__ at 0x7fdd2bcde4d0>
Traceback (most recent call last):
  File "/home/lyk/.conda/envs/sheeprl/lib/python3.10/site-packages/dm_control/_render/base.py", line 118, in __del__
    while self._patients:
AttributeError: 'EGLContext' object has no attribute '_patients'
    self._free_unconditionally()
  File "/home/lyk/.conda/envs/sheeprl/lib/python3.10/site-packages/dm_control/_render/base.py", line 115, in _free_unconditionally
    self._render_executor.terminate(self._free_on_executor_thread)
  File "/home/lyk/.conda/envs/sheeprl/lib/python3.10/site-packages/dm_control/_render/executor/render_executor.py", line 144, in terminate
    cleanup_callable()
  File "/home/lyk/.conda/envs/sheeprl/lib/python3.10/site-packages/dm_control/_render/base.py", line 100, in _free_on_executor_thread
    while self._patients:
AttributeError: 'EGLContext' object has no attribute '_patients'
michele-milesi commented 6 months ago

Hi @LYK-love, thank you for reporting this issue. Can you provide us with additional information about the infrastructure you are using? Which SheepRL version are you using? I also ask you to check the dm_control rendering documentation (https://github.com/google-deepmind/dm_control?tab=readme-ov-file#rendering): do you satisfy the EXT_platform_device requirement?

Please let us know, thanks.

LYK-love commented 6 months ago

Hi @LYK-love, thank you for reporting this issue. Can you provide us with additional information about the infrastructure you are using? Which SheepRL version are you using? I also ask you to check the dm_control rendering documentation (https://github.com/google-deepmind/dm_control?tab=readme-ov-file#rendering): do you satisfy the EXT_platform_device requirement?

Please let us know, thanks.

Sure, I'm using SheepRL 0.4.9. And My GPU is Nvidia 1080Ti, with Driver Version: 545.23.08. I don't think it's an old driver to not support the EXT_platform_device. I also tried to change the OpenGL backend to osmesa and glfw, all failed.

image image
michele-milesi commented 6 months ago

Hi @LYK-love, thank you for your patience. One thing you can try is to set the env.sync_env parameter to True. In some cases, the gymnasium.vector.AsyncVectorEnv has created some issues when using the mujoco environments.

Let me know if works with the SyncVecorEnv, so, by running:

python sheeprl.py exp=dreamer_v3 env=gym env.id=Walker2d-v4 algo.cnn_keys.encoder=[rgb] env.sync_env=True

Thanks.

LYK-love commented 6 months ago

Thanks! With "env.sync_env=True", I can run both mujoco and dmc tasks. I think it'll be helpful if you add it in your document. By the way, add some information of me. In the original post, I used a machine with a messed up env. So everything went wrong, including other dreamerv3 codebases on github. The error message is from that time, so it's quite chaotic and there's no need to dig into its reason. After that, I changed to a Google Cloud server with exact the same config( cuda driver, sheeprl version, all python depencies installed, and egl as OpenGL backend with export $MUJOCO_GL=egl). And I append env.sync_env=True to my command, everything works. Since the Google Cloud server is quite vanilla and is a clean environment, I can expect all conmands can be run on any machine as long as you follow the instructions of the repo. If you get any strange error, maybe try a new machine (in case there's sth messed up on your old one) is a good choice.

michele-milesi commented 6 months ago

Yeah, I will add it to the documentation and I will set the env.sync_env=True as the default configuration for Mujoco environments. Meanwhile, I investigate the reasons why the gymnasium.vector.AsyncVectorEnv does not work correctly with Mujoco. Thanks