RRZE-HPC / likwid

Performance monitoring and benchmarking suite
https://hpc.fau.de/research/tools/likwid/
GNU General Public License v3.0
1.64k stars 226 forks source link

Add support for Nvidia Grace #585

Open TomTheBear opened 8 months ago

TomTheBear commented 8 months ago

This PR also contains some fixes for the COMPILER setting GCCARM.

Documentation: https://docs.nvidia.com/grace-performance-tuning-guide.pdf Uncore events based on perf data in /sys/devices/nvidia*/events

TomTheBear commented 5 months ago

https://developer.arm.com/documentation/102375/latest/

TomTheBear commented 5 months ago

We identified a problem on GraceGrace systems with the current PR. The memory controller devices SCF* cover only the first socket. The second socket is not yet supported.

Thanks @JanLJL for finding this.

TomTheBear commented 4 months ago

The last commit fixes the Uncore device handling with perf_event mode on GraceGrace systems. Commonly one Uncore unit covers all sockets but in case of Nvidia GraceGrace, there are separate devices per socket.

Moreover, the GraceGrace test system reports CPU socket IDs 36 and 2364 which confused the lock setup.