RRZE-HPC / likwid

Performance monitoring and benchmarking suite
https://hpc.fau.de/research/tools/likwid/
GNU General Public License v3.0
1.67k stars 227 forks source link

RAPL Energy Unit not correct on Haswell EP DRAM domain #39

Closed rschoene closed 8 years ago

rschoene commented 8 years ago

We have a user that reports over-estimated power-readings for the Haswell-EP DRAM domain (factor 4). I had a look into the source code and found the bug. You only determine the energy unit by reading MSR_RAPL_POWER_UNIT. In [1, Section 5.3.3] Intel reports that the “ENERGY UNIT for DRAM domain is 15.3 μJ". This should be used for uncore-PCI-register RAPL readings. Hackenberg et al. described in [2, Section IV] that the statement about the energy unit is also correct for the DRAM RAPL reading from MSRs.

Best, Robert

[1] Intel® Xeon® Processor E5-1600 and E5-2600 v3 Product Families, Volume 2 of 2, Registers Datasheet, Intel Corp. [2] An Energy Efficiency Feature Survey of the Intel Haswell Processor, Hackenberg et al., Parallel and Distributed Processing Symposium Workshop (IPDPSW), 2015

TomTheBear commented 8 years ago

Hi,

you didn't read the code far enough. In Line 201 the energy unit for the Haswell EP DRAM domain is changed to 15.3 uJ.

See: https://github.com/RRZE-HPC/likwid/blob/master/src/power.c#L197-L202

The code sets first the energy unit for all domains and later handles special cases like the Haswell EP DRAM domain.

Greetings, Thomas

rschoene commented 8 years ago

My bad. I didn't recognize that the github search is only "Showing the top two matches."

TomTheBear commented 8 years ago

So, I assume, I can close the issue?

rschoene commented 8 years ago

Found the real bug in likwid-powermeter and opened a pull request

TomTheBear commented 8 years ago

Merged already. I also started to read the power related code but I didn't see this one. Thanks for your contribution.