Open Julia-elsammak opened 6 months ago
Tagging @cep21 and @cswatt who have worked on this before, if you'd be so kind as to have a look please.
All of those fixes seem reasonable. As datadog's officially supporting the NVIDA DCGM Exporter now, I've deprecated the nvml plugin internally. It may be best to add it as deprecated here as well. Someone could also modify the plugin to refuse to install for newer datadog versions,but I won't have time to contribute this.
datadog-agent updates have broken this integration for me as well. I've been able to use the DCGM exporter but it requires running the DCGM exporter container which is less than ideal if it's a machine that doesn't run Docker.
While it's not an optimal workaround, I've made the check work using pure-Python Protobuf implementation:
PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python agent check nvml
...
Running Checks
==============
nvml (1.0.9)
------------
Instance ID: nvml:b6f35e1900952b0b [OK]
Configuration Source: file:/etc/datadog-agent/conf.d/nvml.yaml
Total Runs: 1
Metric Samples: Last Run: 0, Total: 0
Events: Last Run: 0, Total: 0
Service Checks: Last Run: 0, Total: 0
Average Execution Time : 1ms
Last Execution Date : 2024-11-11 12:28:56 UTC (1731328136000)
Last Successful Execution Date : 2024-11-11 12:28:56 UTC (1731328136000)
Metadata
========
config.hash: nvml:b6f35e1900952b0b
config.provider: file
Check has run only once, if some metrics are missing you can try again with --check-rate to see any other metric if available.
This check type has 1 instances. If you're looking for a different check instance, try filtering on a specific one using the --instance-filter flag or set --discovery-min-instances to a higher value
This means that it would needed to be applied at agent level for all checks I guess - I'm not aware of being able to use the non-C++ implementation only for this check.
Trying to solve the issue at the root, I think we can release a new patch version for nvml
regenerating the Python protobuf code, with something like:
$ protoc --python_out=nvml/datadog_checks/nvml nvml/datadog_checks/nvml/api.proto
JFI I've opened #2535, tested against Datadog Agent v7.59.0.
Output of the info page
When installing NVML integration, getting the following error:
Loading Errors
Looking at the debug logs
To fix this issue: