NVIDIA / gpu-operator

NVIDIA GPU Operator creates, configures, and manages GPUs in Kubernetes
https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/latest/index.html
Apache License 2.0
1.78k stars 285 forks source link

Unable to cordon nodes #536

Open guyst16 opened 1 year ago

guyst16 commented 1 year ago

1. Quick Debug Checklist

shivamerla commented 1 year ago

@guyst16 can you attach gpu-operator pod logs to confirm if gpu-operator is triggering un-cordon of the node. Also, can you try with driver.upgradePolicy.autoUpgrade=false in ClusterPolicy and verify if same behavior is happening?