RRZE-HPC / likwid

Performance monitoring and benchmarking suite
https://hpc.fau.de/research/tools/likwid/
GNU General Public License v3.0
1.64k stars 226 forks source link

Question: when `likwid-powermeter a.out`, does it measure the power consumption of the whole CPU or just of `a.out`? #617

Open Sunt-ing opened 5 months ago

Sunt-ing commented 5 months ago

I did not find the answer in the doc or issue list. Thanks for answering in advance!

TomTheBear commented 5 months ago

I updated the page of likwid-powermeter in the Wiki. Specifically, I added a section which describes which RAPL domains measure what: https://github.com/RRZE-HPC/likwid/wiki/Likwid-Powermeter#common-domains

Moreover, at this command, I updated the text:

Next you can use likwid-powermeter as a wrapper to output the energy consumed by all domains for the runtime of an application: likwid-powermeter a.out

Does this answer your question? If not, can you please provide feedback and I will update the docs more.

Sunt-ing commented 5 months ago

Thanks for your reply!

I got the following output, and I guess it means: my application used two CPUs (CPU 0 and CPU 10); during the running, CPU 0 consumed 3364.62 J; CPU 10 consumed 5068.51 Joules.

Is my understanding correct? Thanks for your help!

Machine configuration: Two Intel Xeon Silver 4114 10-core CPUs at 2.20 GHz.

Runtime: 100.139 s
Measure for socket 0 on CPU 0
Domain PKG:
Energy consumed: 3364.62 Joules
Power consumed: 33.5996 Watt
Domain PP0:
Energy consumed: 0 Joules
Power consumed: 0 Watt
Domain DRAM:
Energy consumed: 1378.37 Joules
Power consumed: 13.7646 Watt
Domain PLATFORM:
Energy consumed: 0 Joules
Power consumed: 0 Watt

Measure for socket 1 on CPU 10
Domain PKG:
Energy consumed: 5068.51 Joules
Power consumed: 50.615 Watt
Domain PP0:
Energy consumed: 0 Joules
Power consumed: 0 Watt
Domain DRAM:
Energy consumed: 605.982 Joules
Power consumed: 6.05144 Watt
Domain PLATFORM:
Energy consumed: 0 Joules
Power consumed: 0 Watt
TomTheBear commented 5 months ago

Your system has two CPU sockets. Since the energy units exist only once per socket, one CPU per socket is selected to measure the energy stuff. In your case, that's CPU 0 and CPU 10. The IDs for the CPUs are less important, more important is socket 0 and socket 1.

my application used two CPUs (CPU 0 and CPU 10); during the running, CPU 0 consumed 3364.62 J; CPU 10 consumed 5068.51 Joules.

My system has two CPUs (socket 0 and socket1); during the running, all cores of socket 0 consumed 3364.62 J; all cores of socket 1 consumed 5068.51 Joules. Moreover, the memory DIMMs attached to socket 0 consumed 1378.37 J, the memory DIMMs attached to socket 1 consumed 605.982 J.

(PKG domain = all cores of the socket, DRAM domain = all memory DIMMs of the socket)

TomTheBear commented 5 months ago

I added more text to the likwid-powermeter wiki page. Can you please check whether it explains it now. If not, feedback is appreciated.

Sunt-ing commented 5 months ago

Thanks for your answer! Now I know the meaning exactly.

Another question is now I want to do scheduling based on the output information. My scheduler is supposed to be written in Python, so it looks like the best way is to read the output text and then manually parse them, right? Thanks for your help!

TomTheBear commented 5 months ago

Scheduling is a broad term, so I'm not 100% sure what you mean with it. But generally, if you want to integrate LIKWID into something else, I would use the library directly. All operations done by the command line tools are available in the LIKWID API. And luckily for you, there is a Python interface to this API: https://github.com/RRZE-HPC/pylikwid#energy . It might not provide the latest features, but it should work (it's currently in-use on the system I'm typing on). If you have questions/problems about the Python LIKWID API, please open an issue in the pylikwid repository.

Sunt-ing commented 5 months ago

My scheduling is to change knobs like power limit based on the energy consumption in this computer (2 CPU + 1 GPU). Thanks for your kind help again!

TomTheBear commented 5 months ago

For changing knobs, I recommend using the experimental sysfeatures component introduced with 5.3.0. The old/current APIs do not provide knobs for setting a power limit. You have to explicitly enable sysfeatures before the build (BUILD_SYSFEATURES=true in config.mk). Unfortunately, there is no support in pylikwid yet but creating it wouldn't be much work. Of course, you could also use the corresponding CLI app likwid-sysfeatures but changes in the output might require updates of your parser.

The sysfeatures component is the new way to get the energy data. Additionally, it provides the knobs like power limits. likwid-powermeter, likwid-setFrequencies and likwid-features will be deprecated in the future because all those features will be provided by likwid-sysfeatures.

The sysfeatures component is still under development, so if you want to join the effort, let me know.

Sunt-ing commented 5 months ago

I will take a look and use them. Thanks for your careful and informative explanation!