Closed davidLif closed 1 year ago
@davidLif thanks for reporting this. Could you confirm the version of GFD that you are using?
I am using GFD v0.6.1-ubi8. This version came by default from installing gpu-operator v1.11.1
Thanks for the confirmation. I have created https://gitlab.com/nvidia/kubernetes/gpu-feature-discovery/-/merge_requests/127 to address this issue.
Is this a critical issue from your perspective, or could this wait for the next release which should be out by the end of September?
I it can wait for the next release.
Does GFD writes a label with it's version on the node? I am trying to think about an easy way of handeling this case for k8s operators using the nvidia.com/gpu.memory
label.
GFD does not generate a label with it's version as far as I am aware.
@davidLif the fix for the issue you are seeing was already released in v0.6.2
. Could you confirm that this is the case?
@elezar The fix is working. Thanks!
Hello,
While testing a GKE node with a 40 GB, I noticed that the
nvidia.com/gpu.memory
label on the node had a value of "42505273344". Accordingn to the README, which states that the label should contain "Memory of the GPU in Mb".After looking at the code, I see that happens because for MIG strategy "None", the value is extracted using
nvmlDeviceGetMemoryInfo
. According to the docs ( https://docs.nvidia.com/deploy/nvml-api/group__nvmlDeviceQueries.html#group__nvmlDeviceQueries_1g2dfeb1db82aa1de91aa6edf941c85ca8 ) the function "retrieves the amount of used, free, reserved and total memory available on the device, in bytes".