NVIDIA / gpu-feature-discovery

GPU plugin to the node feature discovery for Kubernetes
Apache License 2.0
287 stars 47 forks source link

fail to create nodefeature #47

Closed ejlee125 closed 1 year ago

ejlee125 commented 1 year ago

Hello, I tried to build kubernetes on MIG gpus with nvidia-device-plugin and gpu-feature-discovery. I installed two repo wih helm3 and "kubectl describe node" shows "nvidia-com:mig-~~" on Capcaity and Allocatable section. And "feature.node.kubernetes.io/cpu-" items are listed in label section also. But I can not see the label start with "nvidia.com"

And gpu-node-feature pod shows errors;

E0629 02:05:03.783259       1 main.go:95] failed to create NodeFeature object "nvidia-features-for-": NodeFeature.nfd.k8s-sigs.io "nvidia-features-for-" is invalid: metadata.name: Invalid value: "nvidia-features-for-": a lowercase RFC 1123 subdomain must consist of lower case alphanumeric characters, '-' or '.', and must start and end with an alphanumeric character (e.g. 'example.com', regex used for validation is '[a-z0-9]([-a-z0-9]*[a-z0-9])?(\.[a-z0-9]([-a-z0-9]*[a-z0-9])?)*')

gpu-feature-discovery : 0.8.0 nvidia-device-plugin : 0.12.0

How can I fix this?

ejlee125 commented 1 year ago

I found that cause of failure was clusterrole issue in gpu-feature-discovery. After adding "create" verb in nodefeatures resources in gpu-feature-discovery clusterrole, labels for "feature.node" on node were successfully listed.