TACC / tacc_stats

TACC Stats is an automated resource-usage monitoring and analysis package.
GNU Lesser General Public License v2.1
41 stars 15 forks source link

Override required for RAPL DRAM Energy units on Haswell/Broadwell (and probably KNL) #9

Closed jdmccalpin closed 1 year ago

jdmccalpin commented 7 years ago

Intel’s documents for Haswell and Broadwell say that the RAPL energy unit for the DRAM is 15.3 micro-Joules (1/65536 Joules), independent of the RAPL energy unit from the MSR_RAPL_POWER_UNIT register (which returns 61.04 micro-Joules on our Haswell EP systems).

Confirmation: Using the 15.3 micro-Joule value, I see very reasonable results on Hikari (Xeon E5-2690 v3):

DRAM config Idle Socket Running STREAM
1 Single-rank DIMM per channel 1.07W / DIMM 3.37W / DIMM
1 Dual-rank DIMM per channel 0.92W / DIMM 4.52W / DIMM

These are per-DIMM results (per-package divided by 4), averaged over 1 minute of wall clock time, and are consistent with my expectations based on the data sheets and on plugging in the average bandwidth and page hit/miss rates (both from IMC counters) into the the Micron DDR4 Power Estimation spreadsheet.

Observed, but not Documented by Intel: Similar values are seen on our Xeon Phi 7250 nodes (KNL) with 6 DDR4/2400 dual-rank DIMMs per node. When running long STREAM jobs RAPL reports DRAM energy consumption numbers (e.g., about 4.5 Watts per DIMM) that only make sense if you assume that the DRAM energy unit is 15.3 micro-Joules, rather than the 61.04 micro-Joule value from the MSR_RAPL_POWER_UNIT register. I can't find this in any Intel documentation, but I can't find volume 2 of the Xeon Phi x200 datasheet, which is where I would expect to find it (as above with Haswell and Broadwell).

To Do: So we need a hard-coded DRAM energy unit in tacc_stats for specific processor models. From Table 35-1 in Volume 3 of the Intel Architectures Software Developer’s Manual (document 325384-060, September 2016), the DisplayFamily_DisplayModel values for these processors should be:

References: