RRZE-HPC / likwid

Performance monitoring and benchmarking suite
https://hpc.fau.de/research/tools/likwid/
GNU General Public License v3.0
1.65k stars 226 forks source link

Summary table of likwid-perfctr shows incorrect values for "intensive" metrics #539

Open rrzeschorscherl opened 1 year ago

rrzeschorscherl commented 1 year ago

Bug description likwid-perfctr incorrectly reports some metrics by adding up core- or socket-local values. This happens, e.g., with:

These are "intensive" quantities, i.e., they do not scale with the size of the machine but need to be "averaged" (not literally, of course) in the proper way. In contrast, "extensive" quantities like energy consumption, memory data volume, etc, can be added across the machine to yield a useful number.

To Reproduce

Suggestion

TomTheBear commented 10 months ago

Thanks for your suggestion. I thought about it but it will not be in the upcoming 5.3 version.

While the internal calculator would already support functions like SUM(X,Y,Z) or MIN(X,Y,Z), the integration of data from other threads can be problematic. Especially in the MarkerAPI where each thread updates its own values. One has to synchronize the threads after the counter readings to ensure valid metric values.

In order to reduce the changes to the internal calculator, one could use a two-step approach. When creating the internal group structure, we could expand the proposed syntax SUM(<countername>, <topological-info) to SUM(<countername>_<hw0>, <countername>_<hw1>, ...) with <hw*> being the responsible HW threads for the topological level. This way, we can still use the internal calculator for the final calculation. Of course, it still increases the work in each metric evaluation because we would need to fill the variables map (countername -> value) with the values of all HW threads. In case of modern systems with 100s of HW threads, this will cause quite some overhead.

Moreover, it does not change the way the statistics table is calculated and it is questionable whether it is still required at all. All threads would have the same CPI, Clock, etc. Calculating min, max, mean does not make sense for those or one has to magically transform SUM(cycles, all HW threads) to e.g. MIN(cycles, all HW threads) and re-calculate for the statistics table.