Open onstring opened 2 years ago
Hi @onstring,
There are no such metrics as of today. DCGM does not have fields with such information, but there is an API to collect information about running PIDs.
What form would you want to see this information, and what utility should it have? I can imagine a metric with the total number of processes occupying a GPU, but I do not see how exact processes could be represented or used here. Could you elaborate?
The scenario is in our cloud platform, besides those instances using GPU, we also have many instances only using normal compute/CPU resources. So we would like to know the statistics about how many GPUs are occupied.
For example, from the above nvidia-smi output, we would like to know the number of processes(maybe processes names) for each GPU instance:
GPU-d0180485-9584-433c-6782-c335d5df2cb3, 1
GPU-777ead31-954e-837f-590f-6c4974d8e571, 2
Do we have any metrics / Is it worthy to add a metric about the GPU allocated compute process, just like the following output of nvidia-smi: