Open Samiser opened 7 months ago
@Samiser could you run:
nvidia-ctk --debug cdi generate
I assume that the utility does not find libdxcore.so
by itself, meaning that the mode needs to be explicitly set.
Note that we do use dlopen
to load libdxcore.so
, so you could try setting LD_PRELOAD=${PATH_TO_LIB}/libdxcore.so
explicitly. This should help both the autodetection and the generation.
I would have to look at how to make this more robust.
sure, here is the debug output both with --mode wsl
and without:
~
❯ nvidia-ctk --debug cdi generate --mode wsl
DEBU[0000] Locating NVIDIA Container Toolkit CLI as nvidia-ctk
DEBU[0000] Locating "nvidia-ctk" in [/run/wrappers/bin /home/sam/.nix-profile/bin /nix/profile/bin /home/sam/.local/state/nix/profile/bin /etc/profiles/per-user/sam/bin /nix/var/nix/profiles/default/bin /run/current-system/sw/bin /home/sam/bin /usr/local/sbin /usr/local/bin /usr/sbin /usr/bin /sbin /bin]
DEBU[0000] Checking candidate '/run/current-system/sw/bin/nvidia-ctk'
DEBU[0000] Found 1 candidates; ignoring further candidates
DEBU[0000] Found nvidia-ctk candidates: [/run/current-system/sw/bin/nvidia-ctk]
DEBU[0000] Using NVIDIA Container Toolkit CLI path nvidia-ctk
DEBU[0000] Locating /dev/dxg
DEBU[0000] Locating "/dev/dxg" in [/ /dev]
DEBU[0000] Checking candidate '/dev/dxg'
DEBU[0000] Located /dev/dxg as [/dev/dxg]
INFO[0000] Selecting /dev/dxg as /dev/dxg
ERRO[0000] failed to generate CDI spec: failed to create edits common for entities: failed to create discoverer for WSL driver: failed to initialize dxcore: failed to initialize dxcore context
~
❯ nvidia-ctk --debug cdi generate
DEBU[0000] Locating NVIDIA Container Toolkit CLI as nvidia-ctk
DEBU[0000] Locating "nvidia-ctk" in [/run/wrappers/bin /home/sam/.nix-profile/bin /nix/profile/bin /home/sam/.local/state/nix/profile/bin /etc/profiles/per-user/sam/bin /nix/var/nix/profiles/default/bin /run/current-system/sw/bin /home/sam/bin /usr/local/sbin /usr/local/bin /usr/sbin /usr/bin /sbin /bin]
DEBU[0000] Checking candidate '/run/current-system/sw/bin/nvidia-ctk'
DEBU[0000] Found 1 candidates; ignoring further candidates
DEBU[0000] Found nvidia-ctk candidates: [/run/current-system/sw/bin/nvidia-ctk]
DEBU[0000] Using NVIDIA Container Toolkit CLI path nvidia-ctk
DEBU[0000] Is WSL-based system? false: could not load DXCore library: libdxcore.so: cannot open shared object file: No such file or directory
DEBU[0000] Is NVML-based system? false: could not load NVML library: libnvidia-ml.so.1: cannot open shared object file: No such file or directory
DEBU[0000] Is Tegra-based system? false: /sys/devices/soc0/family file not found
INFO[0000] Auto-detected mode as "nvml"
ERRO[0000] failed to generate CDI spec: failed to create device CDI specs: failed to initialize NVML: ERROR_LIBRARY_NOT_FOUND
also, setting LD_PRELOAD doesn't seem to help:
~
❯ ls /usr/lib/wsl/lib/libdxcore.so
/usr/lib/wsl/lib/libdxcore.so
~
❯ LD_PRELOAD=/usr/lib/wsl/lib/libdxcore.so nvidia-ctk --debug cdi generate --mode wsl
DEBU[0000] Locating NVIDIA Container Toolkit CLI as nvidia-ctk
DEBU[0000] Locating "nvidia-ctk" in [/run/wrappers/bin /home/sam/.nix-profile/bin /nix/profile/bin /home/sam/.local/state/nix/profile/bin /etc/profiles/per-user/sam/bin /nix/var/nix/profiles/default/bin /run/current-system/sw/bin /home/sam/bin /usr/local/sbin /usr/local/bin /usr/sbin /usr/bin /sbin /bin]
DEBU[0000] Checking candidate '/run/current-system/sw/bin/nvidia-ctk'
DEBU[0000] Found 1 candidates; ignoring further candidates
DEBU[0000] Found nvidia-ctk candidates: [/run/current-system/sw/bin/nvidia-ctk]
DEBU[0000] Using NVIDIA Container Toolkit CLI path nvidia-ctk
DEBU[0000] Locating /dev/dxg
DEBU[0000] Locating "/dev/dxg" in [/ /dev]
DEBU[0000] Checking candidate '/dev/dxg'
DEBU[0000] Located /dev/dxg as [/dev/dxg]
INFO[0000] Selecting /dev/dxg as /dev/dxg
ERRO[0000] failed to generate CDI spec: failed to create edits common for entities: failed to create discoverer for WSL driver: failed to initialize dxcore: failed to initialize dxcore context
also using the flag --library-search-path
doesn't seem to help either:
~
❯ nvidia-ctk --debug cdi generate --mode wsl --library-search-path /usr/lib/wsl/lib/
DEBU[0000] Locating NVIDIA Container Toolkit CLI as nvidia-ctk
DEBU[0000] Locating "nvidia-ctk" in [/run/wrappers/bin /home/sam/.nix-profile/bin /nix/profile/bin /home/sam/.local/state/nix/profile/bin /etc/profiles/per-user/sam/bin /nix/var/nix/profiles/default/bin /run/current-system/sw/bin /home/sam/bin /usr/local/sbin /usr/local/bin /usr/sbin /usr/bin /sbin /bin]
DEBU[0000] Checking candidate '/run/current-system/sw/bin/nvidia-ctk'
DEBU[0000] Found 1 candidates; ignoring further candidates
DEBU[0000] Found nvidia-ctk candidates: [/run/current-system/sw/bin/nvidia-ctk]
DEBU[0000] Using NVIDIA Container Toolkit CLI path nvidia-ctk
DEBU[0000] Locating /dev/dxg
DEBU[0000] Locating "/dev/dxg" in [/ /dev]
DEBU[0000] Checking candidate '/dev/dxg'
DEBU[0000] Located /dev/dxg as [/dev/dxg]
INFO[0000] Selecting /dev/dxg as /dev/dxg
ERRO[0000] failed to generate CDI spec: failed to create edits common for entities: failed to create discoverer for WSL driver: failed to initialize dxcore: failed to initialize dxcore context
As mentioned in https://github.com/NixOS/nixpkgs/pull/312253 and https://github.com/nix-community/NixOS-WSL/issues/433, you should either use the wsl.useWindowsDriver
option from NixOS-WSL or use LD_LIBRARY_PATH=/usr/lib/wsl/lib
when generating the CDI.
➜ sudo LD_LIBRARY_PATH=/usr/lib/wsl/lib nvidia-ctk --debug cdi generate --mode wsl --output=/etc/cdi/nvidia.yaml
DEBU[0000] Locating NVIDIA Container Toolkit CLI as nvidia-ctk
DEBU[0000] Checking candidate '/usr/bin/nvidia-ctk'
DEBU[0000] Found 1 candidates; ignoring further candidates
DEBU[0000] Found nvidia-ctk candidates: [/usr/bin/nvidia-ctk]
DEBU[0000] Using NVIDIA Container Toolkit CLI path /usr/bin/nvidia-ctk
DEBU[0000] Inferred output format as "yaml" from output file name
DEBU[0000] Locating /dev/dxg
DEBU[0000] Checking candidate '/dev/dxg'
DEBU[0000] Located /dev/dxg as [/dev/dxg]
INFO[0000] Selecting /dev/dxg as /dev/dxg
INFO[0000] Using WSL driver store paths: [/usr/lib/wsl/drivers/iigd_dch.inf_amd64_73655f941b1dd71f /usr/lib/wsl/drivers/nvlti.inf_amd64_9a2c79b60d6607c6]
WARN[0000] Found multiple driver store paths: [/usr/lib/wsl/drivers/iigd_dch.inf_amd64_73655f941b1dd71f /usr/lib/wsl/drivers/nvlti.inf_amd64_9a2c79b60d6607c6]
DEBU[0000] Using specified NVIDIA Container Toolkit CLI path /usr/bin/nvidia-ctk
DEBU[0000] Locating libcuda.so.1.1
DEBU[0000] Checking candidate '/usr/lib/wsl/drivers/nvlti.inf_amd64_9a2c79b60d6607c6/libcuda.so.1.1'
DEBU[0000] Found 1 candidates; ignoring further candidates
DEBU[0000] Located libcuda.so.1.1 as [/usr/lib/wsl/drivers/nvlti.inf_amd64_9a2c79b60d6607c6/libcuda.so.1.1]
INFO[0000] Selecting /usr/lib/wsl/drivers/nvlti.inf_amd64_9a2c79b60d6607c6/libcuda.so.1.1 as /usr/lib/wsl/drivers/nvlti.inf_amd64_9a2c79b60d6607c6/libcuda.so.1.1
DEBU[0000] Locating libcuda_loader.so
DEBU[0000] Checking candidate '/usr/lib/wsl/drivers/nvlti.inf_amd64_9a2c79b60d6607c6/libcuda_loader.so'
DEBU[0000] Found 1 candidates; ignoring further candidates
DEBU[0000] Located libcuda_loader.so as [/usr/lib/wsl/drivers/nvlti.inf_amd64_9a2c79b60d6607c6/libcuda_loader.so]
INFO[0000] Selecting /usr/lib/wsl/drivers/nvlti.inf_amd64_9a2c79b60d6607c6/libcuda_loader.so as /usr/lib/wsl/drivers/nvlti.inf_amd64_9a2c79b60d6607c6/libcuda_loader.so
DEBU[0000] Locating libnvidia-ptxjitcompiler.so.1
DEBU[0000] Checking candidate '/usr/lib/wsl/drivers/nvlti.inf_amd64_9a2c79b60d6607c6/libnvidia-ptxjitcompiler.so.1'
DEBU[0000] Found 1 candidates; ignoring further candidates
DEBU[0000] Located libnvidia-ptxjitcompiler.so.1 as [/usr/lib/wsl/drivers/nvlti.inf_amd64_9a2c79b60d6607c6/libnvidia-ptxjitcompiler.so.1]
INFO[0000] Selecting /usr/lib/wsl/drivers/nvlti.inf_amd64_9a2c79b60d6607c6/libnvidia-ptxjitcompiler.so.1 as /usr/lib/wsl/drivers/nvlti.inf_amd64_9a2c79b60d6607c6/libnvidia-ptxjitcompiler.so.1
DEBU[0000] Locating libnvidia-ml.so.1
DEBU[0000] Checking candidate '/usr/lib/wsl/drivers/nvlti.inf_amd64_9a2c79b60d6607c6/libnvidia-ml.so.1'
DEBU[0000] Found 1 candidates; ignoring further candidates
DEBU[0000] Located libnvidia-ml.so.1 as [/usr/lib/wsl/drivers/nvlti.inf_amd64_9a2c79b60d6607c6/libnvidia-ml.so.1]
INFO[0000] Selecting /usr/lib/wsl/drivers/nvlti.inf_amd64_9a2c79b60d6607c6/libnvidia-ml.so.1 as /usr/lib/wsl/drivers/nvlti.inf_amd64_9a2c79b60d6607c6/libnvidia-ml.so.1
DEBU[0000] Locating libnvidia-ml_loader.so
DEBU[0000] Checking candidate '/usr/lib/wsl/drivers/nvlti.inf_amd64_9a2c79b60d6607c6/libnvidia-ml_loader.so'
DEBU[0000] Found 1 candidates; ignoring further candidates
DEBU[0000] Located libnvidia-ml_loader.so as [/usr/lib/wsl/drivers/nvlti.inf_amd64_9a2c79b60d6607c6/libnvidia-ml_loader.so]
INFO[0000] Selecting /usr/lib/wsl/drivers/nvlti.inf_amd64_9a2c79b60d6607c6/libnvidia-ml_loader.so as /usr/lib/wsl/drivers/nvlti.inf_amd64_9a2c79b60d6607c6/libnvidia-ml_loader.so
DEBU[0000] Locating libdxcore.so
DEBU[0000] Checking candidate '/usr/lib/wsl/lib/libdxcore.so'
DEBU[0000] Found 1 candidates; ignoring further candidates
DEBU[0000] Located libdxcore.so as [/usr/lib/wsl/lib/libdxcore.so]
INFO[0000] Selecting /usr/lib/wsl/lib/libdxcore.so as /usr/lib/wsl/lib/libdxcore.so
DEBU[0000] Locating nvcubins.bin
DEBU[0000] Checking candidate '/usr/lib/wsl/drivers/nvlti.inf_amd64_9a2c79b60d6607c6/nvcubins.bin'
DEBU[0000] Found 1 candidates; ignoring further candidates
DEBU[0000] Located nvcubins.bin as [/usr/lib/wsl/drivers/nvlti.inf_amd64_9a2c79b60d6607c6/nvcubins.bin]
INFO[0000] Selecting /usr/lib/wsl/drivers/nvlti.inf_amd64_9a2c79b60d6607c6/nvcubins.bin as /usr/lib/wsl/drivers/nvlti.inf_amd64_9a2c79b60d6607c6/nvcubins.bin
DEBU[0000] Locating nvidia-smi
DEBU[0000] Checking candidate '/usr/lib/wsl/drivers/nvlti.inf_amd64_9a2c79b60d6607c6/nvidia-smi'
DEBU[0000] Found 1 candidates; ignoring further candidates
DEBU[0000] Located nvidia-smi as [/usr/lib/wsl/drivers/nvlti.inf_amd64_9a2c79b60d6607c6/nvidia-smi]
INFO[0000] Selecting /usr/lib/wsl/drivers/nvlti.inf_amd64_9a2c79b60d6607c6/nvidia-smi as /usr/lib/wsl/drivers/nvlti.inf_amd64_9a2c79b60d6607c6/nvidia-smi
DEBU[0000] returning cached mounts
DEBU[0000] returning cached mounts
INFO[0000] Generated CDI spec with version 0.3.0
➜ nvidia-ctk cdi list
No help topic for 'list'
➜ podman run --rm -ti --device=nvidia.com/gpu=all ubuntu nvidia-smi
Error: stat nvidia.com/gpu=all: no such file or directory
i'm attempting to use
nvidia-ctk
to generate a CDI spec in WSL running NixOS, but am getting the following error:if i generate the CDI spec on a different VM and use that config directly (only changing the location of
nvidia-ctk
) thennvidia-ctk
successfully finds the device and i can use it in containers:nvidia-container-toolkit.json (click to expand)
``` { "cdiVersion": "0.3.0", "containerEdits": { "hooks": [ { "args": [ "nvidia-ctk", "hook", "create-symlinks", "--link", "/usr/lib/wsl/drivers/nv_dispig.inf_amd64_1fea8972dc2f0a69/nvidia-smi::/usr/bin/nvidia-smi" ], "hookName": "createContainer", "path": "/run/current-system/sw/bin/nvidia-ctk" }, { "args": [ "nvidia-ctk", "hook", "update-ldcache", "--folder", "/usr/lib/wsl/drivers/nv_dispig.inf_amd64_1fea8972dc2f0a69", "--folder", "/usr/lib/wsl/lib" ], "hookName": "createContainer", "path": "/run/current-system/sw/bin/nvidia-ctk" } ], "mounts": [ { "containerPath": "/usr/lib/wsl/lib/libdxcore.so", "hostPath": "/usr/lib/wsl/lib/libdxcore.so", "options": [ "ro", "nosuid", "nodev", "bind" ] }, { "containerPath": "/usr/lib/wsl/drivers/nv_dispig.inf_amd64_1fea8972dc2f0a69/libcuda.so.1.1", "hostPath": "/usr/lib/wsl/drivers/nv_dispig.inf_amd64_1fea8972dc2f0a69/libcuda.so.1.1", "options": [ "ro", "nosuid", "nodev", "bind" ] }, { "containerPath": "/usr/lib/wsl/drivers/nv_dispig.inf_amd64_1fea8972dc2f0a69/libcuda_loader.so", "hostPath": "/usr/lib/wsl/drivers/nv_dispig.inf_amd64_1fea8972dc2f0a69/libcuda_loader.so", "options": [ "ro", "nosuid", "nodev", "bind" ] }, { "containerPath": "/usr/lib/wsl/drivers/nv_dispig.inf_amd64_1fea8972dc2f0a69/libnvidia-ml.so.1", "hostPath": "/usr/lib/wsl/drivers/nv_dispig.inf_amd64_1fea8972dc2f0a69/libnvidia-ml.so.1", "options": [ "ro", "nosuid", "nodev", "bind" ] }, { "containerPath": "/usr/lib/wsl/drivers/nv_dispig.inf_amd64_1fea8972dc2f0a69/libnvidia-ml_loader.so", "hostPath": "/usr/lib/wsl/drivers/nv_dispig.inf_amd64_1fea8972dc2f0a69/libnvidia-ml_loader.so", "options": [ "ro", "nosuid", "nodev", "bind" ] }, { "containerPath": "/usr/lib/wsl/drivers/nv_dispig.inf_amd64_1fea8972dc2f0a69/libnvidia-ptxjitcompiler.so.1", "hostPath": "/usr/lib/wsl/drivers/nv_dispig.inf_amd64_1fea8972dc2f0a69/libnvidia-ptxjitcompiler.so.1", "options": [ "ro", "nosuid", "nodev", "bind" ] }, { "containerPath": "/usr/lib/wsl/drivers/nv_dispig.inf_amd64_1fea8972dc2f0a69/nvcubins.bin", "hostPath": "/usr/lib/wsl/drivers/nv_dispig.inf_amd64_1fea8972dc2f0a69/nvcubins.bin", "options": [ "ro", "nosuid", "nodev", "bind" ] }, { "containerPath": "/usr/lib/wsl/drivers/nv_dispig.inf_amd64_1fea8972dc2f0a69/nvidia-smi", "hostPath": "/usr/lib/wsl/drivers/nv_dispig.inf_amd64_1fea8972dc2f0a69/nvidia-smi", "options": [ "ro", "nosuid", "nodev", "bind" ] } ] }, "devices": [ { "containerEdits": { "deviceNodes": [ { "path": "/dev/dxg" } ] }, "name": "all" } ], "kind": "nvidia.com/gpu" } ```i've also tried populating every other flag with the locations of the files in /usr/lib/wsl/ but that didn't make a difference, i assume that's handled by
--mode wsl
here's the relevant nix config if it helps (ommitting nixos-wsl import section):
and here's the gpu working with the manual config:
let me know if there's any more information i can provide!