RRZE-HPC / likwid

Performance monitoring and benchmarking suite
https://hpc.fau.de/research/tools/likwid/
GNU General Public License v3.0
1.65k stars 226 forks source link

[BUG] Zero Energy Output except for the first core #533

Closed Hongchao92 closed 10 months ago

Hongchao92 commented 1 year ago

Describe the bug Zero Energy except for the first core

To Reproduce

To Reproduce with a LIKWID command Please supply the output of the command with -V 3 added to the command:

Options: -h, --help Help message -v, --version Version information -V, --verbose Verbose output, 0 (only errors), 1 (info), 2 (details), 3 (developer) -c Processor ids to measure (required), e.g. 1,2-4,8 -C Processor ids to pin threads and measure, e.g. 1,2-4,8 For information about the syntax, see likwid-pin -g, --group Performance group or custom event set string for CPU monitoring -H Get group help (together with -g switch) -s, --skip Bitmask with threads to skip -M <0|1> Set how MSR registers are accessed, 0=direct, 1=accessDaemon -a List available performance groups -e List available events and counter registers -E List available events and corresponding counters that match -i, --info Print CPU info -T

... or ... -m, --marker Use Marker API inside code Output options: -o, --output Store output to file. (Optional: Apply text filter according to filename suffix) -O Output easily parseable CSV instead of fancy tables --stats Always print statistics table perf_event specific options: --perfpid Measure given PID --execpid Use the PID of wrapped application for measurements Examples: List all performance groups: likwid-perfctr -a List all events and counters: likwid-perfctr -e List all events and suitable counters for events with 'L2' in them: likwid-perfctr -E L2 Run command on CPU 2 and measure performance group CLOCK: likwid-perfctr -C 2 -g CLOCK ./a.out * likwid-setFrequencies * likwid-powermeter DEBUG - [HPMinit:98] Adjusting functions for x86 architecture in daemon mode DEBUG - [access_x86_rdpmc_init:156] Test for RDPMC for PMC counters returned 1 DEBUG - [access_x86_rdpmc_init:163] Test for RDPMC for FIXED instruction counter returned 1 DEBUG - [access_x86_rdpmc_init:171] Test for RDPMC for FIXED core cycles counter returned 1 DEBUG - [access_x86_rdpmc_init:179] Test for RDPMC for FIXED reference cycle counter returned 1 * likwid-perscope likwid-perscope: command not found Please supply the output of the command with `-d` added to the command line: * likwid-mpirun ERROR: No option -n/-np, -nperdomain or -pin **Additional context** Add any other context about the problem here.
TomTheBear commented 1 year ago

The energy counters of Intel chips are socket-specific, see here for SKX, so you get only one value for the whole CPU socket covering all hwthreads of that socket.

When you run likwid-mpirun -np 4 ..., you probably get scheduled on the hwthreads 0-3. Only one of them actually reads the counters, in your case hwthread 0.

Hongchao92 commented 1 year ago

Many thanks for the quick response!

You are right! If I use the following command, I can get the energy of one core of each socket (in total energy consumption of 2 out of 4 cores). likwid-mpirun -np 4 -nperdomain S:2 -g ENERGY ./$exe input

So that means for SKX, only the energy of one core on each socket can be reported even though the code runs on more than one cores, is this correct? The value I got seems not to be the sum of all the used cores on the socket.

TomTheBear commented 1 year ago

So that means for SKX, only the energy of one core on each socket can be reported even though the code runs on more than one cores, is this correct? The value I got seems not to be the sum of all the used cores on the socket.

No, it means that there exists only one central counter register for all HW threads of a CPU socket (PKG domain). All energy comsumed by all HW threads of the CPU socket is counted there. Any HW thread of the socket could read the counter. LIKWID selects the first on each socket in the given cpuset to do that but it covers the consumption of the whole CPU socket. It might not look like the sum because it contains more than just the hw threads. There is a rather constant base consumption and then there is the consumption of the active hw threads. For the exact definition what is part of a RAPL domain, you have to check the vendors documentation.