NVIDIA / deepops

Tools for building GPU clusters
BSD 3-Clause "New" or "Revised" License
1.25k stars 326 forks source link

Isaac Sim container - missing Nvidia Vulkan library #1223

Closed cocakohler closed 1 year ago

cocakohler commented 2 years ago

I deployed DeepOps K8s on a single node with 4 RTX A6000 GPUs built in. I can run GPU-accelerated containers (e.g. nvdia-smi) without an issues. But i'm running into problems when running the NVIDIA Isaac Sim container https://catalog.ngc.nvidia.com/orgs/nvidia/containers/isaac-sim When i start the container i get the follwing error: Fatal Error: Can't find libGLX_nvidia.so.0

As far as I understand this libGLX_nvidia.so.0 is part of the Vulkan (https://developer.nvidia.com/vulkan) which should be part of the driver.

When I do

gpu@gpu01:~$ dpkg-query -l | grep nvidia

i get the following output:

ii  libnvidia-cfg1-510-server:amd64            510.73.05-0ubuntu0.20.04.1          amd64        NVIDIA binary OpenGL/GLX configuration library
ii  libnvidia-compute-510-server:amd64         510.73.05-0ubuntu0.20.04.1          amd64        NVIDIA libcompute package
ii  nvidia-compute-utils-510-server            510.73.05-0ubuntu0.20.04.1          amd64        NVIDIA compute utilities
ii  nvidia-dkms-510-server                     510.73.05-0ubuntu0.20.04.1          amd64        NVIDIA DKMS package
ii  nvidia-headless-510-server                 510.73.05-0ubuntu0.20.04.1          amd64        NVIDIA headless metapackage
ii  nvidia-headless-no-dkms-510-server         510.73.05-0ubuntu0.20.04.1          amd64        NVIDIA headless metapackage - no DKMS
ii  nvidia-kernel-common-510-server            510.73.05-0ubuntu0.20.04.1          amd64        Shared files used with the kernel module
ii  nvidia-kernel-source-510-server            510.73.05-0ubuntu0.20.04.1          amd64        NVIDIA kernel source package
ii  nvidia-utils-510-server                    510.73.05-0ubuntu0.20.04.1          amd64        NVIDIA Server Driver support binaries

I'm afraid of doing a manual update of the driver package. Any ideas?

ajdecon commented 2 years ago

Please try installing the libnvidia-gl-510-server package in addition to the current packages.

Many of the OpenGL files are broken out into their own packages, and we don't install these by default in DeepOps because our clusters are mostly not used for graphical workloads.

cocakohler commented 2 years ago

@ajdecon So doing sudo apt install libnvidia-gl-510 should be the way to go? Do you expect any side-effects by doing that?

ajdecon commented 2 years ago

That's my suggestion based on the error presented. I'm not experienced with Isaac Sim so I don't know for sure this will work. 😄

I don't expect any side effects from the extra package(s) being installed.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open for 60 days with no activity. Please update the issue or it will be closed in 7 days.

cocakohler commented 1 year ago

sudo apt install libnvidia-gl-510-server made the fix.