Hi, I'm trying to use dcgm exporter to monitor gpu utilization with mig device.
mig mixed strategy was set on k8s and gpu setting is as follows.
gpu0 : mig enabled (mig device 7)
gpu1 ~ gpu6: mig disabled
when I try to get metrics from dcgm exporter, I only get gpu0(mig) information, not gpu1 ~ gpu6. (+ also, prometheus only shows gpu0, not others)
Can I get mig disabled and mig enabled metrics values?
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 7 0 585962 C python 4839MiB |
+-----------------------------------------------------------------------------+
- in prometheus, when i search `DCGM_FI_PROF_GR_ENGINE_ACTIVE` metric then these outputs are shown:
![image](https://user-images.githubusercontent.com/17642294/132439307-3c5dff28-6af1-4d82-a437-ffab6ffdad5a.png)
Hi, I'm trying to use dcgm exporter to monitor gpu utilization with mig device. mig mixed strategy was set on k8s and gpu setting is as follows.
Can I get mig disabled and mig enabled metrics values?
nvidia-smi
output:+-----------------------------------------------------------------------------+ | MIG devices: | +------------------+----------------------+-----------+-----------------------+ | GPU GI CI MIG | Memory-Usage | Vol| Shared | | ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG| | | | ECC| | |==================+======================+===========+=======================| | 0 7 0 0 | 4846MiB / 4864MiB | 14 0 | 1 0 0 0 0 | | | 2MiB / 8191MiB | | | +------------------+----------------------+-----------+-----------------------+ | 0 8 0 1 | 3MiB / 4864MiB | 14 0 | 1 0 0 0 0 | | | 0MiB / 8191MiB | | | +------------------+----------------------+-----------+-----------------------+ | 0 9 0 2 | 3MiB / 4864MiB | 14 0 | 1 0 0 0 0 | | | 0MiB / 8191MiB | | | +------------------+----------------------+-----------+-----------------------+ | 0 10 0 3 | 3MiB / 4864MiB | 14 0 | 1 0 0 0 0 | | | 0MiB / 8191MiB | | | +------------------+----------------------+-----------+-----------------------+ | 0 11 0 4 | 3MiB / 4864MiB | 14 0 | 1 0 0 0 0 | | | 0MiB / 8191MiB | | | +------------------+----------------------+-----------+-----------------------+ | 0 12 0 5 | 3MiB / 4864MiB | 14 0 | 1 0 0 0 0 | | | 0MiB / 8191MiB | | | +------------------+----------------------+-----------+-----------------------+ | 0 13 0 6 | 3MiB / 4864MiB | 14 0 | 1 0 0 0 0 | | | 0MiB / 8191MiB | | | +------------------+----------------------+-----------+-----------------------+
+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 7 0 585962 C python 4839MiB | +-----------------------------------------------------------------------------+