PKU-MARL / DexterousHands

This is a library that provides dual dexterous hand manipulation tasks through Isaac Gym
https://pku-marl.github.io/DexterousHands/
Apache License 2.0
653 stars 78 forks source link

Segmentation fault #12

Open quantumiracle opened 2 years ago

quantumiracle commented 2 years ago

Hi,

When I run experiments with python train.py --task=ShadowHandOver --algo=ppo, it generates the following error:

Algorithm:  ppo
Python
Averaging factor:  0.01
Obs type: full_state
Not connected to PVD
+++ Using GPU PhysX
Physics Engine: PhysX
Physics Device: cuda:0
GPU Pipeline: enabled
WARNING: lavapipe is not a conformant vulkan implementation, testing use only.
...
Unhandled descriptor set 433
Unhandled descriptor set 1788307008
Segmentation fault (core dumped)

I know this might not be an issue of the repo, but the compatibility of Nvidia gpu driver. Just to post here to see if anyone has the solution.

My test GPU is Nvidia A6000 with NVIDIA-SMI 510.85.02 Driver Version: 510.85.02 CUDA Version: 11.6

quantumiracle commented 2 years ago

some update: by vulkaninfo:

    Devices: count = 9
        GPU id  : 0 (NVIDIA RTX A6000)
        Layer-Device Extensions: count = 0

        GPU id  : 1 (llvmpipe (LLVM 12.0.0, 256 bits))
        Layer-Device Extensions: count = 0

        GPU id  : 2 (NVIDIA RTX A6000)
        Layer-Device Extensions: count = 0

        GPU id  : 3 (NVIDIA RTX A6000)
        Layer-Device Extensions: count = 0

        GPU id  : 4 (NVIDIA RTX A6000)
        Layer-Device Extensions: count = 0

        GPU id  : 5 (NVIDIA RTX A6000)
        Layer-Device Extensions: count = 0

        GPU id  : 6 (NVIDIA RTX A6000)
        Layer-Device Extensions: count = 0

        GPU id  : 7 (NVIDIA RTX A6000)
        Layer-Device Extensions: count = 0

        GPU id  : 8 (NVIDIA RTX A6000)
        Layer-Device Extensions: count = 0

some GPU (GPU id: 1) may not using nvidia driver, but llvmpipe instead. Any specification to this GPU will lead to the segmentation fault.

gemcollector commented 2 years ago

some update: by vulkaninfo:

  Devices: count = 9
      GPU id  : 0 (NVIDIA RTX A6000)
      Layer-Device Extensions: count = 0

      GPU id  : 1 (llvmpipe (LLVM 12.0.0, 256 bits))
      Layer-Device Extensions: count = 0

      GPU id  : 2 (NVIDIA RTX A6000)
      Layer-Device Extensions: count = 0

      GPU id  : 3 (NVIDIA RTX A6000)
      Layer-Device Extensions: count = 0

      GPU id  : 4 (NVIDIA RTX A6000)
      Layer-Device Extensions: count = 0

      GPU id  : 5 (NVIDIA RTX A6000)
      Layer-Device Extensions: count = 0

      GPU id  : 6 (NVIDIA RTX A6000)
      Layer-Device Extensions: count = 0

      GPU id  : 7 (NVIDIA RTX A6000)
      Layer-Device Extensions: count = 0

      GPU id  : 8 (NVIDIA RTX A6000)
      Layer-Device Extensions: count = 0

some GPU (GPU id: 1) may not using nvidia driver, but llvmpipe instead. Any specification to this GPU will lead to the segmentation fault.

I also met this problem when using issacgym. So have you fixed this problem?

cypypccpy commented 2 years ago

Hi @gemcollector ,

Isaac Gym is currently only available on nvidia GPU, you can specify the GPU (such as cuda:2) to run on by adding --rl_device=cuda:2 --sim_device=cuda:2 to the command line at startup to avoid using Isaac Gym on a GPU without a cuda driver.

Hope this can help you.

cypypccpy commented 2 years ago

Hi @quantumiracle ,

Thank you for your sharing, this information is very valuable!

quantumiracle commented 2 years ago

@gemcollector I just specify the device as suggested by @cypypccpy to avoid the GPU using llvmpipe as a temporal solution.