Describe the bug
When measuring memory bandwidth using likwid-perfctr in stethoscope mode and group MEM, reported memory bandwidth is ~ half of the memory bandwidth reported by the benchmark and AMDµProf measurements.
Looks like 2950x has 4 memory channels, two per die (2 for CCD-0 and 2 for CDD-1).
Threadripper 2950x with 4x16GB RAM and all 4 DRAM channel being used (Output of dmidecode --type 17 provided).
Additional contextProcessor Programming Reference (PPR) for AMD Family 17h Models 01h,08h, Revision B2 Processors p161 writes that the measurement of transferred Data bytes is per Node which they define as : A node, is an integrated circuit device that includes one to 8 cores (one or two Core Complexes). i.e. CDD ? Which is detected by likwid-topology as a different die and not a different socket.
Describe the bug When measuring memory bandwidth using likwid-perfctr in stethoscope mode and group MEM, reported memory bandwidth is ~ half of the memory bandwidth reported by the benchmark and AMDµProf measurements.
Looks like
2950x
has 4 memory channels, two per die (2 for CCD-0 and 2 for CDD-1).Threadripper 2950x with 4x16GB RAM and all 4 DRAM channel being used (Output of
dmidecode --type 17
provided).dmidecode17.txt
To Reproduce
Version 5.2.2 (commit: 233ab943543480cd46058b34616c174198ba0459)
stress-ng --memcpy 0
AMDuProfPcm -m memory -a -q -d 2000
for example, which is approximately double of what's reported by Likwid.AMDuProf.txt
To Reproduce with a LIKWID command Output of the command with
-V 3
added to the command:perfctr.txt topology.txt
Additional context Processor Programming Reference (PPR) for AMD Family 17h Models 01h,08h, Revision B2 Processors p161 writes that the measurement of transferred Data bytes is per Node which they define as : A node, is an integrated circuit device that includes one to 8 cores (one or two Core Complexes). i.e. CDD ? Which is detected by
likwid-topology
as a different die and not a different socket.