NVIDIA / gpu-operator

NVIDIA GPU Operator creates/configures/manages GPUs atop Kubernetes
Apache License 2.0
1.53k stars 264 forks source link

Add nodeSelect to ClusterPolicySpec CRD to select different MIG strategy in different node #704

Closed lengrongfu closed 2 months ago

lengrongfu commented 2 months ago

I have a use case. We have a GPU cluster. Currently, the entire cluster using MIG can only have single or mixed mode. I want to set some nodes to single mode and some nodes to mixed mode.

lengrongfu commented 2 months ago

If I think this case is needed by the community, I can contribute.

klueska commented 2 months ago

This is already supported using a custom config file for the device plugin.

The docs are a bit disorganized in that the instructions for how to supply a custom config file for the plugin are buried in the instructions for setting up time-slicing, but the content is still relevant: https://docs.nvidia.com/datacenter/cloud-native/gpu-operator/23.9.2/gpu-sharing.html#configuration

Whatever setting you put for the migStrategy will be set on the node when you associate it with that config.

lengrongfu commented 2 months ago

Thanks for the documentation you provided. The following are the steps I took to implement the use case.

cdesiniotis commented 2 months ago

Closing as the requested feature is already supported.