NVIDIA / dcgm-exporter

NVIDIA GPU metrics exporter for Prometheus leveraging DCGM
Apache License 2.0
869 stars 153 forks source link

Why `DCGM_FI_DEV_PCIE_{TX,RX}_THROUGHPUT` is default instead of `DCGM_FI_PROF_PCIE_{TX,RX}_BYTES `? #354

Closed koshieguchi closed 2 months ago

koshieguchi commented 3 months ago

I find it very confusing that DCGM_FI_DEV_PCIE_TX_THROUGHPUT is still marked as default in files like ./etc/default-counters.csv, despite the following statement in the DCGM Documentation (Version 3.3):

https://docs.nvidia.com/datacenter/dcgm/latest/dcgm-api/dcgm-api-field-ids.html#c.DCGM_FI_DEV_PCIE_TX_THROUGHPUT

DCGM_FI_DEV_PCIE_TX_THROUGHPUT200¶ PCIe Tx utilization information. Deprecated: Use DCGM_FI_PROF_PCIE_TX_BYTES instead.

In fact, as mentioned in Get DCGM_FI_DEV_PCIE_TX_THROUGHPUT metric failed · Issue #167 · NVIDIA/dcgm-exporter, I was able to obtain PCIe data by changing DCGM_FI_DEV_PCIE_TX_THROUGHPUT to DCGM_FI_PROF_PCIE_TX_BYTES.

Is there a reason why DCGM_FI_DEV_PCIE_TX_THROUGHPUT hasn't been updated to DCGM_FI_PROF_PCIE_TX_BYTESin the CSV files under ./etc?

If it would be better to change it, I can create a pull request.

glowkey commented 3 months ago

Thanks for reporting this. We are planning to update the default watchlist for the next major release of DCGM-Exporter in the coming months and this change is included. Feel free to create an MR in the meantime if that is helpful for your site.

koshieguchi commented 2 months ago

Thanks for the reply!

We are planning to update the default watchlist for the next major release of DCGM-Exporter in the coming months and this change is included.

That's good to hear!

Feel free to create an MR in the meantime if that is helpful for your site.

I've created #357 to address this.