Open ccding opened 8 months ago
Depending on your driver version you may also need to DCGM_FI_DEV_FB_RESERVED in your equation.
Thanks for the response. My driver version is 535.129.03 and I don't see DCGM_FI_DEV_FB_RESERVED in my prometheus
The output of nvidia-smi has the accurate and constant total GPU memory
These are the only available metrics
@nvvfedorov is this fixed?
I have exactly the same use case and ran into the same issue. Why is there no metric showing the total number of GPU memory (like 32GB for V100, 80GB for H100, etc)?
I reopened the issue as active and interesting for the community.
I have the same problem
I need to count some statistics about using gpu, but I get wrong results
our usecase needs to show the gpu memory usage over total memory so we used the sum of the above two metrics as the GPU total memory, but it seems the sum is not const
here is the output