DataDog / datadog-agent

Main repository for Datadog Agent
https://docs.datadoghq.com/
Apache License 2.0
2.9k stars 1.21k forks source link

[BUG] Jetson integration does not handle Jetson Orin #17678

Open damonmaria opened 1 year ago

damonmaria commented 1 year ago

Agent Environment Machine: NVIDIA Jetson Orin NX 16GB

# kubectl exec -it -n monitoring datadog-2h7wv -- agent version
Defaulted container "agent" out of: agent, trace-agent, process-agent, init-volume (init), init-config (init)
Agent 7.45.0 - Commit: 964e770 - Serialization version: v5.0.81 - Go version: go1.19.9
# cat /etc/nv_tegra_release
# R35 (release), REVISION: 3.1, GCID: 32827747, BOARD: t186ref, EABI: aarch64, DATE: Sun Mar 19 15:19:21 UTC 2023

Describe what happened: The Jetson check fails with could not parse voltage fields from this code since the regex cannot parse the output of tegrastats.

We are using the latest Jetson model: Orin. The results of tegrastats on this device return in a different format to previous, and the Jetson integration cannot handle them:

~# tegrastats
06-16-2023 10:45:09 RAM 6334/15388MB (lfb 1770x4MB) SWAP 491/7694MB (cached 0MB) CPU [6%@729,9%@729,5%@729,16%@729,off,off,off,off] EMC_FREQ 0%@2133 GR3D_FREQ 0%@611 GR3D2_FREQ 0%@0 VIC_FREQ 729 APE 174 CV0@45.812C CPU@47.937C SOC2@46.093C SOC0@46.968C CV1@46.406C GPU@45.875C tj@48.875C SOC1@48.875C CV2@45.75C VDD_IN 5299mW/5299mW VDD_CPU_GPU_CV 773mW/773mW VDD_SOC 1424mW/1424mW
06-16-2023 10:45:10 RAM 6335/15388MB (lfb 1770x4MB) SWAP 491/7694MB (cached 0MB) CPU [16%@1344,11%@1344,12%@1331,14%@1497,off,off,off,off] EMC_FREQ 0%@2133 GR3D_FREQ 0%@611 GR3D2_FREQ 0%@0 VIC_FREQ 729 APE 174 CV0@45.937C CPU@47.843C SOC2@46.218C SOC0@46.875C CV1@46.343C GPU@46C tj@48.812C SOC1@48.812C CV2@45.625C VDD_IN 5218mW/5258mW VDD_CPU_GPU_CV 691mW/732mW VDD_SOC 1424mW/1424mW

Describe what you expected: The Jetson check should work on a Jetson Orin with the latest JetPack.

Steps to reproduce the issue: Enable the Jetson integration on a Jetson Orin.

Additional environment details (Operating System, Cloud provider, etc): The test for the Jetson integration lists the different models of Jetson that it handles.

I've provided an example from the Orin above. If the test is able to work with that then that's all that's needed.

heyryanw commented 8 months ago

Same issue.

NouemanKHAL commented 1 month ago

Hi @damonmaria and @heyryanw, thank you for reporting this issue!

The error should be fixed by https://github.com/DataDog/datadog-agent/pull/29925 , it should be available in the next agent release version 7.60.0.

Thank you!

Let me know if you have any questions/comments.