boreshkinai / delta-interpolator

Apache License 2.0
75 stars 11 forks source link

Can't find libnvidia-ml.so.1 #10

Closed Toctave closed 1 year ago

Toctave commented 1 year ago

Hello, When running the container on Ubuntu 22.04, I run into the following issue :

$ sudo nvidia-docker run -p 18888:8888 -p 16006:6006 -v ~/workspace/delta-interpolator:/workspace/delta-interpolator -t -d --shm-size="8g" --name delta_interpolator_$USER delta_interpolator:$USER
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #1: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown.

The issue seems to be that libnvidia-ml.so.1 could not be found. The following packages supply it :

$ sudo apt-file find libnvidia-ml.so.1
libnvidia-compute-390: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1
libnvidia-compute-418-server: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1
libnvidia-compute-450-server: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1
libnvidia-compute-470: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1
libnvidia-compute-470-server: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1
libnvidia-compute-510: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1
libnvidia-compute-510-server: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1
libnvidia-compute-515: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1
libnvidia-compute-515-server: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1
libnvidia-compute-525: /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1

I'm not sure which one should be installed, since I suppose it depends on the driver version that comes with the container. That being said, I guess the host machine dictates which driver version should be used since it must match the hardware. Being unfamiliar with Docker, I'm not sure what the correct way to make this work is.

boreshkinai commented 1 year ago

it's a problem with driver installation. Never use apt to install GPU drivers

boreshkinai commented 1 year ago

please follow NVIDIA guidelines for driver installation, e.g. https://docs.nvidia.com/datacenter/tesla/tesla-installation-notes/index.html

Toctave commented 1 year ago

Ah right, thanks. This was due to running everything inside of an Ubuntu 22.04 VM (my system runs Ubuntu 22.10, which nvidia-docker doesn't support), which doesn't support using NVIDIA GPUs. Sorry for bothering you with that one .