Open selinnilesy opened 1 year ago
Idk if the official device plugin will support MPS (conclusion of this blog says it will) but I was able to test MPS using Nebuly's fork.
We do not currently have plans to support it in the traditional device plugin, but we plan on supporting it with the new resource management API called DRA as shown in in this demo:
https://drive.google.com/file/d/1sU1ZhY4zNKBtXAeVx3sozHT0PuDz4-Qf/view?usp=drive_link
Thank you Kevin, I have installed the NVIDIA DRA too, but I am afraid you don't support the CRDs for deploying mps sharing yet.
@klueska , could you provide a detailed product page or github repository for DRA? or the product roadmap?
Would like to see this sooner rather than later. Seems pretty fundamental
Yes, we would also like to hear about this mps- k8s integration ASAP
This is related to the discussion on #467. We are active investigating adding MPS support to the plugin.
@elezar So roadmap is to add it to official device plugin or support this as part of DRA as @klueska wrote above ? (or both of them will get to production ready stage)
@ettelr we are working on adding support to the device plugin directly. It will also be included as part of DRA once that is released as a production-ready option.
We just released an RC for the next version of the k8s-device-plugin with support for MPS: https://github.com/NVIDIA/k8s-device-plugin/tree/v0.15.0-rc.1?tab=readme-ov-file#with-cuda-mps
We would appreciate people to try this out and give any feedback you have before the final release in a few weeks.
@klueska I tested MPS support with v0.15.0-rc.2. It says "Explicitly set sharing.mps.failRequestsGreaterThanOne = true" in the release notes. Is it possible to set sharing.mps.failRequestsGreaterThanOne to false? It didn't work when I tried with config file like below.
sharing:
mps:
failRequestsGreaterThanOne: false
resources:
- name: nvidia.com/gpu
replicas: 4
If failedRequestsGreaterThanOne sets to false, does it mean that a pod can utilize n times fraction of the resources when its gpu requests is set to n?
Furthermore, each of these resources -- either nvidia.com/gpu or nvidia.com/gpu.shared -- would have access to the same fraction (1/10) of the total memory and compute resources of the GPU.
Is it also a way to implement MPS to only specific GPU server nodes and not others?
@klueska I tested MPS support with v0.15.0-rc.2. It says "Explicitly set sharing.mps.failRequestsGreaterThanOne = true" in the release notes. Is it possible to set sharing.mps.failRequestsGreaterThanOne to false? It didn't work when I tried with config file like below.
sharing: mps: failRequestsGreaterThanOne: false resources: - name: nvidia.com/gpu replicas: 4
If failedRequestsGreaterThanOne sets to false, does it mean that a pod can utilize n times fraction of the resources when its gpu requests is set to n?
Furthermore, each of these resources -- either nvidia.com/gpu or nvidia.com/gpu.shared -- would have access to the same fraction (1/10) of the total memory and compute resources of the GPU.
Is it also a way to implement MPS to only specific GPU server nodes and not others?
Also interested in this one, once I set the device plugin sharing config do I have a way to request different amount of memory for each pod ? Is it by calculation the amount of shared I need and specifying in requests more than one? Any other idea how to achieve this?
Using the stable version of 0.15.0 and I am also interested in MPS
with failRequestsGreaterThanOne=false
so that I ca allocate different amount of GPU memory to different processes
or enabling requesting more than one shared device , or alternatively set the slice side and let the device plugin create it dynamically will be super useful setting the static amount of equal slices in configuration makes this not scalable enough is there anything like this in roadmap?
We plan to include support for MIG devices in the next release, but we do not have immediate plans to relax the failRequestsGreaterThanOne=true
constraint.
The challenge being that the device plugin is not the one that does the actual allocation of devices to the pod (the kubelet does). Meaning that there is no way to ensure that if you are given 2 MPS "replicas" that they come from the same underlying GPU.
We thought about leveraging the GetPreferredAllocation()
call to support this, but then you run into issues with fragmentation. For example, you request 2 replicas, but the only two replicas available are from separate GPUs. The scheduler won't recognize that and will happily schedule your pod to the node. Once on the node though, the allocation will fail and your pod will be stuck.
All of that said, it would be possible to make this work in cases where you only have a single GPU on the node, and we may consider doing that in the next release.
However, our focus is really on providing comprehensive support for MPS using DRA: https://youtu.be/1QfShSQLsbs?si=LJ_f8UfRjiqcl2Pd&t=1080
Adding MPS support to the existing device plugin is really just a stop gap until DRA is stable / usable.
We plan to include support for MIG devices in the next release, but we do not have immediate plans to relax the
failRequestsGreaterThanOne=true
constraint.The challenge being that the device plugin is not the one that does the actual allocation of devices to the pod (the kubelet does). Meaning that there is no way to ensure that if you are given 2 MPS "replicas" that they come from the same underlying GPU.
We thought about leveraging the
GetPreferredAllocation()
call to support this, but then you run into issues with fragmentation. For example, you request 2 replicas, but the only two replicas available are from separate GPUs. The scheduler won't recognize that and will happily schedule your pod to the node. Once on the node though, the allocation will fail and your pod will be stuck.All of that said, it would be possible to make this work in cases where you only have a single GPU on the node, and we may consider doing that in the next release.
However, our focus is really on providing comprehensive support for MPS using DRA: https://youtu.be/1QfShSQLsbs?si=LJ_f8UfRjiqcl2Pd&t=1080
Adding MPS support to the existing device plugin is really just a stop gap until DRA is stable / usable.
and what about creating the slices dynamically with different sizes according to pod's requests. some convention like requesting 1/2 or 1/4 GPU
and what about creating the slices dynamically with different sizes according to pod's requests. some convention like requesting 1/2 or 1/4 GPU
Extended resources (as exposed by the Device Plugin API) only represent countable units and don't allow fractional allocations. With that said, we have discussed more flexible partitioning under MPS, but this would not be dynamic in the case of the device plugin.
and what about creating the slices dynamically with different sizes according to pod's requests. some convention like requesting 1/2 or 1/4 GPU
Extended resources (as exposed by the Device Plugin API) only represent countable units and don't allow fractional allocations. With that said, we have discussed more flexible partitioning under MPS, but this would not be dynamic in the case of the device plugin.
GPU memory is countable, we just want to be able to request different amount of GPU memory for each workloads, since we have a big system with alot of GPUs and different types of workloads requesting different GPU memory
I'm open to suggestions on how to support this, but I don't know how to do it (generally) with the existing device-plugin API and the default kubernetes scheduler. The scheduler simply doesn't have enough information to know what portion of the capacity on each node comes from the same underlying GPU (vs. multiple GPUs).
One thing we've considered is introducing a config option that would advertise each individual GPU as its own resource (i.e. nvidia.com/gpu-0
, nvidia.com/gpu-1
, etc.). With something like that in place, we'd be able to provide what you are looking for, but it would put an extra burden on the user to have to request a specific GPU index rather than a generic nvidia.com/gpu
resource.
I'm open to suggestions on how to support this, but I don't know how to do it (generally) with the existing device-plugin API and the default kubernetes scheduler. The scheduler simply doesn't have enough information to know what portion of the capacity on each node comes from the same underlying GPU (vs. multiple GPUs).
One thing we've considered is introducing a config option that would advertise each individual GPU as its own resource (i.e.
nvidia.com/gpu-0
,nvidia.com/gpu-1
, etc.). With something like that in place, we'd be able to provide what you are looking for, but it would put an extra burden on the user to have to request a specific GPU index rather than a genericnvidia.com/gpu
resource.
giving example if I have 2 nodes with each 2 80GB GPUs each node can have in capacity instead of stable number of gpu.shared: nvidia.com/gpu-10gb:8 nvidia.com/gpu-20gb:4 nvidia.com/gpu-40gb:2 nvidia.com/gpu-80gb:1
The used can pre configure all the options he wants and of course the overlapping counts need to be taken into consideration somehow - is it feasible ?
You can't have overlapping resources using the traditional device plugin. It's the whole reason that MIG requires static partitioning, for example.
You can't have overlapping resources using the traditional device plugin. It's the whole reason that MIG requires static partitioning, for example.
ok we can start from static partitioning here as well just being able to have few sizes for MPS with preconfiguration of them until you have a better idea
What about using a whole GPU side by side with the shred once? assumeing I turn on renameByDefault, Is there an future option where we can use the whole gpus if the MPS sharing is on?
@klueska , static partitioning like with mig is a good enough for start. It would be great, if you could implement that.
Also it would be nice if we had a possibility to use real cards alongside with shared on the same machine. Static configuration is also would be a good enough start for that.
@klueska Is there any way in the meantime to request more than 1 replica from each GPU in my node?
@klueska Is there any way in the meantime to request more than 1 replica from each GPU in my node?
I also want to know about this issue
What would be the best way to enable Multi-Process Sharing on NVIDIA GPUs for Kubernetes pods? (such as an example yaml file for deploying K8s Pods on TITAN Rtx)
I am wondering if there is a recent support by NVIDIA on this.
Thanks in advance.