facebookincubator / dynolog

Dynolog is a telemetry daemon for performance monitoring and tracing. It exports metrics from different components in the system like the linux kernel, CPU, disks, Intel PT, GPUs etc. Dynolog also integrates with pytorch and can trigger traces for distributed training applications.
MIT License
187 stars 34 forks source link

fix pre-defined offcore response counters on CLX #251

Closed Alston-Tang closed 2 months ago

Alston-Tang commented 2 months ago

Summary: we relies on json generated event OFFCORE_RESPONSE to get code and umask for uncore response event. however, this event is missing on CLX architecture and cause TwTaskMonitor failed to create DynoPerfCounter::DRAM_ACCESS_READS event to monitor per-task memory bandwidth read usage.

on CLX architecture, we can use an equivalent event OCR.ALL_PF_RFO.L3_MISS.ANY_SNOOP to get the same code and umask.

Reviewed By: bigzachattack

Differential Revision: D56080910

facebook-github-bot commented 2 months ago

This pull request was exported from Phabricator. Differential Revision: D56080910

facebook-github-bot commented 2 months ago

This pull request has been merged in facebookincubator/dynolog@bfa52e356371ec407f0a2b9e7ef0ff1f90c2474f.