RRZE-HPC / kerncraft

Loop Kernel Analysis and Performance Modeling Toolkit
GNU Affero General Public License v3.0
88 stars 24 forks source link

Benchmark mode cy/CL output for Himeno is off by a factor of 2 #59

Closed rrzeschorscherl closed 6 years ago

rrzeschorscherl commented 6 years ago
emmy$ kerncraft -m machine-files/IvyBridgeEP_E5-2660v2.yml --pmodel Benchmark -D L 50 -D M 500 -D N 500 --cache-predictor LC --compiler icc kernels/himeno.c
[...]
Runtime (per cacheline update): 120.84 cy/CL
MEM volume (per repetition): 760030000 Byte
Performance: 4952.08 MFLOP/s
Performance: 145.65 MLUP/s
Performance: 145.65 It/s

The performance output (4952 MFLOP/s) does not match the runtime per CL output:

(16 LUPs / 121 cy) 34 FLOP/LUP 2.2 Gcy/s = 9.89 GFLOP/s

This is exactly twice the reported performance number above. Based on my own benchmarks (and the performance analysis provided by the ECM model in kerncraft) I conclude that the cy/CL number is too small by a factor of 2 but the performance output is correct. I suspect this has to to with the Himeno benchmark using single-precision data.