Closed davidshen84 closed 1 year ago
I also tried the nvgfd/gpu-feature-discovery
chart directly and observed the same behaviour.
I see. It is because NFD did not find any nvidia related device on the node because WSL does not expose the hardware information correctly.
Even if I add the nvidia.com/gpu.present
label to the node to force GFD to be deployed on the node, I get this error in the pod:
Error getting machine type from /sys/class/dmi/id/product_name: could not open machine type file: open /sys/class/dmi/id/product_name: no such file or directory
Because there's no dmi in WSL. https://github.com/microsoft/WSL/issues/4391
nvm...that error message is actually a warning. I can see the nvidia.com/*
labels on my WSL node.
Hi,
My k8s master node lives in a WSL2 instance. I configured the Nvidia container runtime and I am able to access the GPU in my WSL2 environment.
I installed gfd using the
nvidia-device-plugin
helm chart with the following values:This enables the sub-chart of gfd. Regarding the 8b416016 tag, please refer to https://github.com/NVIDIA/k8s-device-plugin/issues/332.
After the chart is applied, I can see a
nvdp-node-feature-discovery-master
pod and anvdp-node-feature-discovery-worker
pod are created and running. No apparent errors were noticed in either of the pods. However, there's no trace of thenvdp-gpu-feature-discovery
pod.I can confirm that I can run pods with GPU resource requests.
I have another Gentoo Linux machine that joins my k8s cluster as an agent node. Up on joining, the following pods are deployed to this node and the GPU feature labels are added to this node.
I think the nfd master pod believes the WSL 2 node does not support GPU and decided not to deploy the gfd pod to this node at all.