NVIDIA / nvidia-container-toolkit

Build and run containers leveraging NVIDIA GPUs
Apache License 2.0
1.86k stars 212 forks source link

Why this value is `@/sbin/ldconfig`, rather than `/sbin/ldconfig`. #553

Open lengrongfu opened 1 week ago

lengrongfu commented 1 week ago

Os: ubuntu22.04 Kernel: 5.15.0-72-generic GPU Driver: Host install, 535.54.03

I use gpu-operator install nvidia-container-toolkit, but found nvidia-operator-validator this pod is error, error info is couldn't find libnvidia-ml.so.

I found out through some information that it might be a problem with @/sbin/ldconfig, and found that on the host, /sbin/ldconfig is not a soft link file. and then I change /sbin/ldconfig file to soft link to /sbin/ldconfig.real, nvidia-operator-validator this pod can success running.

my question is why this value is @/sbin/ldconfig, can we optimize this code and first determine whether the file is a soft link?

https://github.com/NVIDIA/nvidia-container-toolkit/blob/6b78c72fec07f4f28b9a5f1072ec86af36fb75bb/internal/config/config.go#L131

lengrongfu commented 1 week ago

This function we should change this to check the file is symlink mode.

func getLdConfigPath() string {
    filePath := "/sbin/ldconfig"
    fileInfo, err := os.Lstat(filePath)
    if err != nil || fileInfo.Mode()&os.ModeSymlink != 0{
        filePath = "@/sbin/ldconfig"
    }
    return NormalizeLDConfigPath(filePath)
}