NVIDIA / nvidia-container-toolkit

Build and run containers leveraging NVIDIA GPUs
Apache License 2.0
2.25k stars 245 forks source link

`/etc/ld.so.cache` is wrong running ARM container on x86 host #423

Open trxcllnt opened 6 months ago

trxcllnt commented 6 months ago

Possibly related: https://github.com/NVIDIA/nvidia-container-toolkit/issues/373, https://github.com/NVIDIA/nvidia-container-toolkit/issues/123

It looks like the ldcache is incorrect when emulating an ARM container on an x86 host:

$ docker run --rm -it --platform linux/amd64 --gpus all ubuntu:22.04 ldconfig -p 2>/dev/null | grep libstdc++ && echo $? || echo $?
    libstdc++.so.6 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libstdc++.so.6
0
$ docker run --rm -it --platform linux/arm64 --gpus all ubuntu:22.04 ldconfig -p 2>/dev/null | grep libstdc++ && echo $? || echo $?
1

The above commands without the --gpus all flag have the correct output:

$ docker run --rm -it --platform linux/amd64 ubuntu:22.04 ldconfig -p 2>/dev/null | grep libstdc++ && echo $? || echo $?
    libstdc++.so.6 (libc6,x86-64) => /lib/x86_64-linux-gnu/libstdc++.so.6
0
$ docker run --rm -it --platform linux/arm64 ubuntu:22.04 ldconfig -p 2>/dev/null | grep libstdc++ && echo $? || echo $?
    libstdc++.so.6 (libc6,AArch64) => /lib/aarch64-linux-gnu/libstdc++.so.6
0
elezar commented 6 months ago

@trxcllnt the NVIDIA Container Toolkit Injects GPU driver libraries from the host into the container. With this in mind, it is unlikely that emulated containers will work as expected.

(Note that this is not to say that the code to run ldconfig does not need some adjustment. Could you please confirm your NVIDIA Container Toolkit version?)

trxcllnt commented 6 months ago
$ apt policy nvidia-container-toolkit
nvidia-container-toolkit:
  Installed: 1.14.6-1
  Candidate: 1.14.6-1
  Version table:
 *** 1.14.6-1 600
        600 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64  Packages