NVIDIA / k8s-device-plugin

NVIDIA device plugin for Kubernetes
Apache License 2.0
2.65k stars 605 forks source link

Change GFD repository image V0.15.0 Helm #746

Open YFrendo opened 2 months ago

YFrendo commented 2 months ago

I don't see any option in the Helm chart to change the repository of the GFD image.

It can be useful in a company network where the cluster don't have any access to the internet.

ArangoGutierrez commented 2 months ago

We do have https://github.com/NVIDIA/k8s-device-plugin/blob/main/deployments/helm/nvidia-device-plugin/values.yaml#L48 , isn't this what you look for?

YFrendo commented 2 months ago

Yeah it change the k8s-device-plugin repository but not the GFD repository, the deployement try to pull the GFD pod from the basic repository.

ArangoGutierrez commented 2 months ago

they share the same image (https://github.com/NVIDIA/k8s-device-plugin/blob/main/deployments/container/Dockerfile.ubuntu#L72) , and we have https://github.com/NVIDIA/k8s-device-plugin/blob/main/deployments/helm/nvidia-device-plugin/templates/daemonset-gfd.yml#L134

YFrendo commented 2 months ago

Thanks ! I will try this tomorow but I think we can close this issue!

ArangoGutierrez commented 2 months ago

I'll close it once it works for you :) , not before

Archimonde666 commented 2 months ago

Actually i'm working with @YFrendo to deploy this plugin on a brand new airgaped k8s GPU infrastructure and i did override this setting to our mirrored image hub which worked for the k8s-device-plugin. This seems to be working.

The issue relate indeed on the node discovery feature which is deployed from a separated helm chart located here : https://github.com/NVIDIA/k8s-device-plugin/tree/main/deployments/helm/nvidia-device-plugin/charts When i try to enable the GFD from the k8s-plugin helm chart (here => https://github.com/NVIDIA/k8s-device-plugin/blob/925be6d97361359803eb6502d15fa3e69dbe6e2b/deployments/helm/nvidia-device-plugin/values.yaml#L106C3-L106C17), the created pods are trying to pull an image from registry.k8s.io/nfd/node-feature-discovery:v0.15.3 or something like that ; and so far i didn't manage to override that location for the image. Nor from the original chart of the k8s-plugin, nor even when i use the separate helm chart and override it with the appropriated value (inside this https://github.com/NVIDIA/k8s-device-plugin/blob/main/deployments/helm/nvidia-device-plugin/charts/node-feature-discovery-chart-0.15.3.tgz there is the value template)

elezar commented 2 months ago

@YFrendo if your are able to install NFD separately, you could pass --set nfd.enabled when installing the device plugin and /or gfd. This should disable the internal nfd dependency.

YFrendo commented 2 months ago

@YFrendo if your are able to install NFD separately, you could pass --set nfd.enabled when installing the device plugin and /or gfd. This should disable the internal nfd dependency.

This is the solution, in order to get it work in a restrictive environnement you have to first install NFD separately.

Everything work for us now!

But maybe it should be more explicit in the documentation (or add an nfd.image in the chart) Also nfd.enabled can be add in the helm chart exemple !

https://github.com/NVIDIA/k8s-device-plugin/blob/v0.15.0/deployments/helm/nvidia-device-plugin/values.yaml

Thanks for your support !