To ease the integration with queuing systems like Kueue, we want to let autopilot add a temporary label to nodes when trying to run an invasive health check.
The suggested label is autopilot.ibm.com/gpuhealth=TESTING
This way, any workload managed by a queue that doesn't have a toleration on that label, can not occupy the node.
To ease the integration with queuing systems like Kueue, we want to let autopilot add a temporary label to nodes when trying to run an invasive health check.
The suggested label is
autopilot.ibm.com/gpuhealth=TESTING
This way, any workload managed by a queue that doesn't have a toleration on that label, can not occupy the node.