Closed andiariffin closed 3 years ago
Hi @andiariffin. Thanks for the report.
A bug was fixed in the plugin some time ago to divide by 1024 instead of 1000 here: https://github.com/NVIDIA/k8s-device-plugin/blob/master/mig-strategy.go#L208
It looks like that change didn't make it into gpu-feature-discovery though: https://github.com/NVIDIA/gpu-feature-discovery/blob/master/mig-strategy.go#L276
Ideally there would be one library that both of these pulled from for this, but unfortunately that is not the state of things yet. In any case, we will push a fix out for this soon. Thanks again for reporting.
This has now been fixed in https://gitlab.com/nvidia/kubernetes/gpu-feature-discovery/-/merge_requests/61 and will be part of the next GFD release. Thanks for reporting.
Hi, I was deploying few clusters with A100 GPUs using DeepOps 20.12. Mixed MIG strategy is used as it is defined in DeepOps configuration. However, I noticed that the gfd reported some wrong labeling, i.e.:
The last three lines were under
Capacity
,Allocatable
andAllocated resources
respectively which were already correct. However, theLabels
was incorrectly defined (e.g. mig-3g.21gb -> should be mig-3g.20gb).Although this issue seems to be not breaking any K8s functionality in terms of deploying pods with MIG, it would be nice to have the GPU nodes having properly labeled.