Closed kadircs closed 5 months ago
In your theoretical peak calculation, you use the CPU base frequency of 2.2 GHz. This is reasonable since turbo mode is rarely used with all HW threads active. But in theory, the chip can overclock up to 3.5 GHz. Does likwid-bench
report 2.2 GHz or some higher value? You can also wrap it with likwid-perfctr
to get the actual clock frequency. Otherwise, no spontaneous idea.
likwid-bench
reports 2.2 GHz as seen below:
srun --nodes=1 --cpus-per-task=128 --threads-per-core=1 --partition=7773X -t 1-0:00 --hint=nomultithread likwid-bench -t peakflops_avx_fma -W N:2048kB:128
Cycles: 3317739722
CPU Clock: 2200037484
Cycle Clock: 2200037484
Time: 1.508038e+00 sec
Iterations: 134217728
Iterations per thread: 1048576
Inner loop executions: 500
Size (Byte): 2048000
Size per thread: 16000
Number of Flops: 8053063680000
MFlops/s: 5340093.99
Data volume (Byte): 2147483648000
MByte/s: 1424025.06
Cycles per update: 0.012360
Cycles per cacheline: 0.098876
Loads per update: 1
Stores per update: 0
Load bytes per element: 8
Store bytes per elem.: 0
Instructions: 1275068416032
UOPs: 1207959552000
Likwid reports ~15% greater peak flops with respect to uprof. Would you please help me finding my mistake while running likwid?
Theoretical peak:
128 * 16 * 2.2 * = 4505.6 Gflop/s DP
L1 cache size is 32 KB. Number of threads is 128.