NVIDIA / k8s-device-plugin

NVIDIA device plugin for Kubernetes
Apache License 2.0
2.45k stars 573 forks source link

WSL2 - No devices found. Waiting indefinitely. #646

Closed qingfengfenga closed 1 month ago

qingfengfenga commented 1 month ago

The template below is mostly useful for bug reports and support questions. Feel free to remove anything which doesn't apply to you and add more information where it makes sense.

Important Note: NVIDIA AI Enterprise customers can get support from NVIDIA Enterprise support. Please open a case here.

1. Quick Debug Information

2. Issue or feature description

Briefly explain the issue in terms of expected behavior and current behavior.

The current issue is that the nvidia device plugin pod can execute nvidia smi, but the logs indicate that the graphics card cannot be recognized.

Detailed problem description

https://github.com/justinthelaw/k3d-gpu-support/issues/1

Reference

https://github.com/k3d-io/k3d/issues/1108#issuecomment-1616065479

3. Information to attach (optional if deemed irrelevant)

Common error checking:

NVIDIA-SMI-LOG.txt

Additional information that might help better understand your environment and reproduce the bug:

elezar commented 1 month ago

@qingfengfenga there was some work done for WSL2 in the 0.15.0 release branch. Could you test using the 0.15.0-rc.2 version instead of 0.14.5?

dbreyfogle commented 1 month ago

Hi @qingfengfenga, I recently submitted a PR to k3d which updated the documentation for how to run CUDA workloads: https://k3d.io/v5.6.3/usage/advanced/cuda

It also updated to 0.15.0-rc.2 of the nvidia device plugin as mentioned by @elezar. In my testing on WSL it was working without issues. Do you mind testing out using the new docs and see if that fixes it?

qingfengfenga commented 1 month ago

@elezar @dbreyfogle After using 0.15.0-rc.2, K3D on WLS2 can run CUDA workload normally. Thank you for your work and we look forward to the official release of 0.15 !