NVIDIA / k8s-dra-driver

Dynamic Resource Allocation (DRA) for NVIDIA GPUs in Kubernetes
Apache License 2.0
195 stars 36 forks source link

does DRA support multi GPUs across worker nodes? #97

Open thj08 opened 3 months ago

thj08 commented 3 months ago

I want to let master node have ability to allocate avaliable GPUs across different worker nodes , does DRA support multi GPUs across worker nodes?

ArangoGutierrez commented 3 months ago

So far DRA enables resources in a per-Node scenario given the interaction with the Kubelet. What you are asking is a MultiNode DRA if I understand correctly

thj08 commented 3 months ago

Thanks for the reply. As my real case, I setup my cluster with 1 master node and 3 worker nodes, and every worker node has one GPU resource. Is it possible to apply a container on one worker node with 3 GPUs?

By the way, I also try the DRA demo project, and it use kind for local cluster. Does DRA support remote cluster?

asm582 commented 3 months ago

I think you are talking about DRA and CXL integration. This is discussed as one of the use case but currently not implemented as I understand.