NVIDIA / libnvidia-container

NVIDIA container runtime library
Apache License 2.0
759 stars 189 forks source link

ldconfig-free deployment #234

Open SomeoneSerge opened 5 months ago

SomeoneSerge commented 5 months ago

Hi! Libnvidia-container currently relies on glibc internals when locating the host system's libraries which limits its compatibility with the wider range of e.g. linux distributions. Nvidia-container-toolkit appears to provide a limited support for static configuration, e.g. the ModeCSV used for jetsons: https://github.com/NVIDIA/nvidia-container-toolkit/blob/a2262d00cc6d98ac2e95ae2f439e699a7d64dc17/pkg/nvcdi/lib.go#L98-L102, but many tools (e.g. apptainer and singularityCE) rely on libnvidia-docker directly. I think it's desirable that libnvidia-container (also) support static configuration, whereby the user would specify a list of search paths to look for the userspace driver libraries in, at build time or at runtime.

Motivation

From a glance, the stumbling stones seem to be as follows:

Inspecting the dynamic loader's search paths and inferring the host system's libraries seems to be a valid need, and we probably should consult with glibc (and/or other libc implementations') maintainers as to how to approach it correctly. The optional /etc/ld.so.conf is only one of the tunables that affects the ld.so's behaviour, the others being e.g. LD_PRELOAD, LD_LIBRARY_PATH, DT_RUNPATH. Rather than try and approximate just a part of the dynamic loader's behaviour we should probably use the loader itself. The only "public" interfaces I'm currently aware of are dlopen()+dlinfo() (allows code execution, albeit with the same privileges the parent process already has anyway) and ld.so --list (requires a test elf binary as an argument). I think a ticket in glibc's issue tracker would be a reasonable step forward.

Cf. also https://github.com/apptainer/apptainer/issues/1894, https://github.com/NixOS/nixpkgs/pull/279235, https://github.com/NVIDIA/nvidia-container-toolkit/issues/71

Thanks!