NVIDIA / cloud-native-stack

Run cloud native workloads on NVIDIA GPUs
Apache License 2.0
118 stars 47 forks source link

GPU Operator Installation Failed #34

Closed xyang-harman closed 1 year ago

xyang-harman commented 1 year ago

When try to follow step: helm install --version 22.09 --create-namespace --namespace gpu-operator-resources --devel nvidia/gpu-operator --wait --generate-name

Got Error message: Error: INSTALLATION FAILED: Get "https://10.158.210.53:6443/apis/apiextensions.k8s.io/v1/customresourcedefinitions/nodefeaturerules.nfd.k8s-sigs.io": dial tcp 10.158.210.53:6443: connect: connection refused - error from a previous attempt: http2: server sent GOAWAY and closed the connection; LastStreamID=661, ErrCode=NO_ERROR, debug=""

I thought it maybe due to a backend timeout. I tried again but got connection refused. Error: INSTALLATION FAILED: Kubernetes cluster unreachable: Get "https://10.158.210.53:6443/version": dial tcp 10.158.210.53:6443: connect: connection refused

Please direct me how to proceed. Thanks!

Thanks,

Frank

xyang-harman commented 1 year ago

This was resolved after I restart service

sudo systemctl restart kubelet.service

Thanks,

Frank