Closed estherxyz closed 4 years ago
Hello!
There are different nvidia gpu model install on k8s cluster(or single machine). How could I specific different NVIDIA GPU type in k8s yaml?
You will need to tag them with a label (e.g: nvidia.com/gpu.family) and can then use nodeSelectors in your pod spec to specify certain nodes.
Take a look at the GPU feature discovery that can automatically label nodes for you: https://github.com/NVIDIA/gpu-feature-discovery
I have a single node cluster with multi type of GPUs(one 3070, one 3080) installed. In this case nodeSelectors solution would not work. Is it possible that the k8s device plugin distinguish different GPU resource types?
@pbxqdown At the moment this is not possible. Only a single GPU type per node is supported. However, we are planning to add support for this in the coming months. Stay tuned.
@klueska Thanks, this is awesome! Let me know if I can be of any help with testing or something.
@klueska Are there any news on this feature or is there another issue to watch for this? We have one node with mixed GPU types and it would be great to have this granularity when requesting resources.
any progress?
We had added support about 6 months ago to allow such setups to be detected and allow users to assign a different resource name to each of them (i.e. nvidia.com/rtx-2080 vs nvidia.com/rtx-3090), but it got reverted because our product team wasn’t happy putting arbitrary resource naming in the hands of users.
This is how it would have worked: https://docs.google.com/document/d/1dL67t9IqKC2-xqonMi6DV7W2YNZdkmfX7ibB6Jb-qmk/edit
We had added support about 6 months ago to allow such setups to be detected and allow users to assign a different resource name to each of them (i.e. nvidia.com/rtx-2080 vs nvidia.com/rtx-3090), but it got reverted because our product team wasn’t happy putting arbitrary resource naming in the hands of users.
This is how it would have worked: https://docs.google.com/document/d/1dL67t9IqKC2-xqonMi6DV7W2YNZdkmfX7ibB6Jb-qmk/edit
so, this feature will not supported in the feature or new plan is working on?
1. Issue or feature description
There are different nvidia gpu model install on k8s cluster(or single machine). How could I specific different NVIDIA GPU type in k8s yaml?
Because different GPU type maybe use different cuda version. And docker image need to support same cuda version with GPU device.
2. Steps to reproduce the issue
3. Information to attach (optional if deemed irrelevant)
Common error checking:
nvidia-smi -a
on your host/etc/docker/daemon.json
)Additional information that might help better understand your environment and reproduce the bug:
docker version
uname -a
nvidia-container-cli -V