NVIDIA / libnvidia-container

NVIDIA container runtime library
Apache License 2.0
861 stars 207 forks source link

why does nvidia-container-cli load libnvidia-ml via dlopen rather than linking directly? #223

Open deitch opened 1 year ago

deitch commented 1 year ago

Why does nvidia-container-cli load libnvidia-ml via dlopen rather than linking directly? It uses dlopen(), so it has to find it in the path. This creates a few issues:

If I am running other than a pre-installed OS with that package, I am stuck. And there are lots of custom OS builds there, or versions of an OS, etc.

Separately, if I did want to install it, how do I get it for other OSes, e.g. musl-based like Alpine? Or build from source? I managed to get everything in this repo built from source, including on Alpine, but it fails on run because of that libnvidia-ml dependency.

ubuntuyeah commented 1 year ago

What OS did you build on and what packages did you install?

deitch commented 1 year ago

I did the compile on an ubuntu-based system, but I plan on using it on Alpine as well as possibly a custom-composed OS, so possibly no package manager.

SomeoneSerge commented 11 months ago

Separately, if I did want to install it, how do I get it for other OSes, e.g. musl-based like Alpine?

I think this might be a larger-scale issue actually: since NVidia distributes most of the drivers and libraries in the binary form, most of those also only come linked against glibc? E.g. the cuda libraries, except for some chosen jetson platforms?

dlopen rather than linking directly?

I'm not sure this would work in case of libnvidia-ml.so, because it's part of the "userspace driver" and pins the kernel module version?

❯ strings /run/opengl-driver/lib/libnvidia-ml.so | grep "API mismatch"
NVIDIA: API mismatch: the NVIDIA kernel module has version %s,
NVIDIA: API mismatch: this NVIDIA driver component has version

Unlike libcuda.so, libnvidia-ml.so also comes without the "stub" libraries I believe

deitch commented 11 months ago

except for some chosen jetson platforms?

Actually, the Jetson platform (the "official OS", anyways) is glibc-based.

I'm not sure this would work in case of libnvidia-ml.so, because it's part of the "userspace driver" and pins the kernel module version?

How interesting. libnvidia-ml.so is pinned to a specific kernel version? As you point out, that is userspace, which usually is the kind of thing that is not kernel version pinned.

Unlike libcuda.so, libnvidia-ml.so also comes without the "stub" libraries I believe

What do you mean?