NVIDIA / gpu-operator

NVIDIA GPU Operator creates/configures/manages GPUs atop Kubernetes
Apache License 2.0
1.77k stars 286 forks source link

k8s 1.22.10, helm install nvidia-operator, raise error: unknown field "grpc" in io.k8s.api.core.v1.Probe] #726

Open sycbbyes opened 4 months ago

sycbbyes commented 4 months ago

The template below is mostly useful for bug reports and support questions. Feel free to remove anything which doesn't apply to you and add more information where it makes sense.

Important Note: NVIDIA AI Enterprise customers can get support from NVIDIA Enterprise support. Please open a case here.

1. Quick Debug Information

2. Issue or feature description

helm install --wait --generate-name -n gpu-operator --create-namespace nvidia/gpu-operator

WARNING: Kubernetes configuration file is group-readable. This is insecure. Location: /root/.kube/config WARNING: Kubernetes configuration file is world-readable. This is insecure. Location: /root/.kube/config Error: unable to build kubernetes objects from release manifest: error validating "": error validating data: [ValidationError(Deployment.spec.template.spec.containers[0].livenessProbe): unknown field "grpc" in io.k8s.api.core.v1.Probe, ValidationError(Deployment.spec.template.spec.containers[0].readinessProbe): unknown field "grpc" in io.k8s.api.core.v1.Probe]

3. Steps to reproduce the issue

Detailed steps to reproduce the issue.

4. Information to attach (optional if deemed irrelevant)

Collecting full debug bundle (optional):

curl -o must-gather.sh -L https://raw.githubusercontent.com/NVIDIA/gpu-operator/master/hack/must-gather.sh 
chmod +x must-gather.sh
./must-gather.sh

NOTE: please refer to the must-gather script for debug data collected.

This bundle can be submitted to us via email: operator_feedback@nvidia.com

sycbbyes commented 4 months ago

due to some legacy issue, we have to keep K8S with version: 1.22, while it cannot be applied via helm with error: Error: unable to build kubernetes objects from release manifest: error validating "": error validating data: [ValidationError(Deployment.spec.template.spec.containers[0].livenessProbe): unknown field "grpc" in io.k8s.api.core.v1.Probe, ValidationError(Deployment.spec.template.spec.containers[0].readinessProbe): unknown field "grpc" in io.k8s.api.core.v1.Probe]

how to get ride of this issue? is it possibel to install a former operator to get ride of this error?

thanks.

slyt commented 4 months ago

This is because gRPC probes were added to K8s as feature gated alpha in v1.23, beta in v1.24, and GA in 1.27.

You will likely have trouble with the nvidia-gpu-operator-node-feature-discovery-master pod never becoming healthy (showing 0/1 Ready) since it uses gRPC readiness and liveness probes that are not (currently) configurable. I'm on k8s v1.23 and having this issue.

Lee-1024 commented 3 months ago

I encountered the same issue, and my k8s version is 1.22.12. Is there a solution?

ArangoGutierrez commented 3 months ago

NFD conversation : https://github.com/kubernetes-sigs/node-feature-discovery/issues/1730#issuecomment-2172675081