NVIDIA / nvkind

Apache License 2.0
53 stars 6 forks source link

Cluster creation fails on WSL2 #5

Open jdonkervliet opened 3 months ago

jdonkervliet commented 3 months ago

Hi,

I'm trying nvkind on WSL2, in which the nvidia driver is installed in Windows and exposed to the Linux VM automagically. I followed all steps listed in the requirements section and they all succeed. However, when I create a cluster using ./nvkind cluster create, the cluster is created and post-processing steps installs packages. During this process, I encounter the following error:

<log truncated for readability>
Setting up nvidia-container-toolkit-base (1.16.0~rc.2-1) ...
Setting up libnvidia-container1:amd64 (1.16.0~rc.2-1) ...
Setting up libnvidia-container-tools (1.16.0~rc.2-1) ...
Setting up nvidia-container-toolkit (1.16.0~rc.2-1) ...
Processing triggers for libc-bin (2.36-9+deb12u4) ...
time="2024-08-02T10:58:37Z" level=info msg="Loading config from /etc/containerd/config.toml"
time="2024-08-02T10:58:37Z" level=info msg="Wrote updated config to /etc/containerd/config.toml"
time="2024-08-02T10:58:37Z" level=info msg="It is recommended that containerd daemon be restarted."
umount: /proc/driver/nvidia: not found
F0802 12:58:38.097622   29676 main.go:45] Error: patching /proc/driver/nvidia on node 'nvkind-mz6kz-worker': running script on nvkind-mz6kz-worker: executing command: exit status 1

I guess the nvidia driver works differently on WSL2 than on a regular Linux host, and therefore /proc/driver/nvidia may not be present. How would I work around this issue?