Closed j0hnL closed 5 days ago
this is what i think:
Identify Nodes without GPUs: You need a mechanism to determine which compute nodes in your Kubernetes cluster do not have GPUs available. This can be done through manual inspection or automated scripts that query node specifications.
Node Labeling:
Once you identify nodes without GPUs, apply labels to them using kubectl label nodes
Node Tainting:
Apply taints to nodes without GPUs to repel workloads that require GPUs. Taints prevent non-GPU workloads from being scheduled on these nodes.
Use kubectl taint nodes
Configure Workloads: Ensure that GPU-dependent workloads are configured to tolerate the taints or have node selectors that consider GPU availability. For example, in the Pod specification, you might add tolerations for the taints applied to nodes without GPUs.
This issue is fixed with PR #2238 .
@sujit-jadhav @j0hnL can we close this issue?
Describe the bug when a k8s-manager does not have a GPU Omnia will not deploy the
k8s-device-plugin
. We need to inspect the entire inventory for GPUs before deploying the plugin. I suggest we also taint or label any compute nodes that do not have GPUs because nvidia's plugin does not check. The AMD plugin seems to deploy just fine whether there are AMD accelerators or not.