RRZE-HPC / likwid

Performance monitoring and benchmarking suite
https://hpc.fau.de/research/tools/likwid/
GNU General Public License v3.0
1.65k stars 226 forks source link

[DOCS] Understanding likwid-setFrequencies and reset options #386

Closed pramodk closed 3 years ago

pramodk commented 3 years ago

What are you searching for

I am trying to understand likwid-setFrequencies and reading the documentation from https://github.com/RRZE-HPC/likwid/wiki/likwid-setFrequencies.

This is what I tried from the beginning:

pramod@localhost:~$ likwid-setFrequencies -p
WARN: Intel HWP capabilities enabled. CPU and Uncore frequency changes are ignored but allowed
Current CPU frequencies:
HWThread 0: governor  performance min/cur/max 1.0/2.999938/3.7 GHz Turbo 1
HWThread 1: governor  performance min/cur/max 1.0/2.999938/3.7 GHz Turbo 1
HWThread 2: governor  performance min/cur/max 1.0/3.000079/3.7 GHz Turbo 1
HWThread 3: governor  performance min/cur/max 1.0/2.999938/3.7 GHz Turbo 1
HWThread 4: governor  performance min/cur/max 1.0/2.999938/3.7 GHz Turbo 1
HWThread 5: governor  performance min/cur/max 1.0/3.000079/3.7 GHz Turbo 1
HWThread 6: governor  performance min/cur/max 1.0/2.999938/3.7 GHz Turbo 1
HWThread 7: governor  performance min/cur/max 1.0/2.999938/3.7 GHz Turbo 1
HWThread 8: governor  performance min/cur/max 1.0/3.000079/3.7 GHz Turbo 1
...
Current Uncore frequencies:
Socket 0: min/max 1.2/2.4 GHz
Socket 1: min/max 1.2/2.4 GHz

$ likwid-setFrequencies -l
WARN: Intel HWP capabilities enabled. CPU and Uncore frequency changes are ignored but allowed
Cannot get frequencies from cpufreq module
The intel_pstate module allows free selection of frequencies in the available range
Minimal CPU frequency 1.0
Maximal CPU frequency 3.7
$ likwid-setFrequencies -f 2.5
WARN: Intel HWP capabilities enabled. CPU and Uncore frequency changes are ignored but allowed
ERROR: Setting maximal CPU frequency below base CPU frequency with activated Turbo mode is not supported.
$ likwid-setFrequencies -t 0
WARN: Intel HWP capabilities enabled. CPU and Uncore frequency changes are ignored but allowed

$ likwid-setFrequencies -p
WARN: Intel HWP capabilities enabled. CPU and Uncore frequency changes are ignored but allowed
Current CPU frequencies:
HWThread 0: governor  performance min/cur/max 1.0/2.299859/2.3 GHz Turbo 0
HWThread 1: governor  performance min/cur/max 1.0/2.299298/2.3 GHz Turbo 0
HWThread 2: governor  performance min/cur/max 1.0/2.299859/2.3 GHz Turbo 0
...

which worked fine.

$ likwid-setFrequencies -x 1.0 -y 3.7
WARN: Intel HWP capabilities enabled. CPU and Uncore frequency changes are ignored but allowed

$ likwid-setFrequencies -p
WARN: Intel HWP capabilities enabled. CPU and Uncore frequency changes are ignored but allowed
Current CPU frequencies:
HWThread 0: governor  performance min/cur/max 1.0/2.299859/2.3 GHz Turbo 0
HWThread 1: governor  performance min/cur/max 1.0/2.299859/2.3 GHz Turbo 0
HWThread 2: governor  performance min/cur/max 1.0/2.299859/2.3 GHz Turbo 0
HWThread 3: governor  performance min/cur/max 1.0/2.299859/2.3 GHz Turbo 0
...

Why am I not able to change max frequency again?

~$ likwid-setFrequencies -reset -ureset
WARN: Intel HWP capabilities enabled. CPU and Uncore frequency changes are ignored but allowed
Reset to governor performance with min freq. 1.0 GHz and deactivate turbo mode

$ likwid-setFrequencies -p
WARN: Intel HWP capabilities enabled. CPU and Uncore frequency changes are ignored but allowed
Current CPU frequencies:
HWThread 0: governor  performance min/cur/max 1.0/2.3/2.3 GHz Turbo 0
HWThread 1: governor  performance min/cur/max 1.0/2.3/2.3 GHz Turbo 0
HWThread 2: governor  performance min/cur/max 1.0/2.299859/2.3 GHz Turbo 0
...
Current Uncore frequencies:
Socket 0: min/max 1.0/3.7 GHz
Socket 1: min/max 1.0/3.7 GHz

I see that Uncore is now reset to 3.7 instead of original min/max 1.2/2.4 GHz. But I don't see any change in freq for HWThread.

$ likwid-setFrequencies -f 3.0
WARN: Intel HWP capabilities enabled. CPU and Uncore frequency changes are ignored but allowed
$ likwid-setFrequencies -p
WARN: Intel HWP capabilities enabled. CPU and Uncore frequency changes are ignored but allowed
Current CPU frequencies:
HWThread 0: governor  performance min/cur/max 1.0/2.299859/2.3 GHz Turbo 0
HWThread 1: governor  performance min/cur/max 1.0/2.3/2.3 GHz Turbo 0
...

Question : I want to reset changes I made and want to return to initial settings. Assuming there is no root access (i.e. without cpupower), how to can I restore original min/cur/max frequencies using likwid commands?

If I have missed any obvious part of documentation then let me know.

Thank you in advance!

Note : not sure if it's relevant but -l option gives me below warning as well: $ likwid-setFrequencies -l WARN: Intel HWP capabilities enabled. CPU and Uncore frequency changes are ignored but allowed Cannot get frequencies from cpufreq module The intel_pstate module allows free selection of frequencies in the available range Minimal CPU frequency 1.0 Maximal CPU frequency 3.7

TomTheBear commented 3 years ago

LIKWID never stores the state of CPU and Uncore frequencies that's why there isn't anything like return to initial settings. Moreover, likwid-setFrequencies refuses to execute some settings: If turbo mode is disabled, the maximal usable frequency is the base frequency. So if you want to set a frequency higher than base with turbo disabled, likwid-setFrequencies will refuse to do it (and should print a message about it which does not seem to work in all cases). Moreover, if turbo is enabled, it does not make much sense to select a frequency below base frequency because the turbo will overwrite it and the frequency is not stable.

So, you started with a system in turbo mode (turbo on, 1.0-3.7 GHz). Then you deactivated turbo mode, so the range of usable frequencies reduced to 1.0-<base_freq> (probably 2.3 GHz). That's why -x 1.0 -y 3.7 didn't work, the 3.7 GHz is out of range. The -reset -ureset didn't change anything because your CPU frequencies were already set to the reset values before. What you want is -t 1 -reset. You havn't changed anything for the Uncore, so using -ureset wasn't needed. Since turbo was still off in your last attempt to pin to 3.0 GHz (-f 3.0), it got ignored as well because it's above base frequency.

So, if you want to get the initial settings back, you can tell it to do so explicitly -t 1 -x 1.0 -y 3.7 or use -t 1 -reset.

The documentation for -reset contains basically all that info:

Change CPU frequencies to its minimal and maximal frequencies. If not specified, the turbo is deactivated and the maximum non-turbo frequency selected. The turbo reset setting can be set with -t <0|1>. The governor is switched back to the last on the list of available governors (-m) or to the one specified (-g ).

So for your system:

Note: Some frequencies are not available in a machine-readable form, like min/max Uncore frequencies. Therefore, in case of -ureset, it uses the CPU frequencies as basis. That's why your -ureset changed the min Uncore frequency from 1.2 GHz to 1.0 GHz and the max Uncore frequency from 2.4 GHz to 3.7 GHz. The Uncore ignores settings outside of range, so even if you specify 3.7 GHz as maximum, it runs only with 2.4 GHz (observable with likwid-perfctr).

pramodk commented 3 years ago

First of all, thank you very much for detailed response! (as usual!)

LIKWID never stores the state of CPU and Uncore frequencies that's why there isn't anything like return to initial settings. Moreover, likwid-setFrequencies refuses to execute some settings: If turbo mode is disabled, the maximal usable frequency is the base frequency. So if you want to set a frequency higher than base with turbo disabled, likwid-setFrequencies will refuse to do it (and should print a message about it which does not seem to work in all cases). Moreover, if turbo is enabled,

Understood, make sense.

You havn't changed anything for the Uncore, so using -ureset wasn't needed. Since turbo was still off in your last attempt to pin to 3.0 GHz (-f 3.0), it got ignored as well because it's above base frequency.

Ok 👍

So, if you want to get the initial settings back, you can tell it to do so explicitly -t 1 -x 1.0 -y 3.7 or use -t 1 -reset.

Based on above I was trying this:

# starting with these settings

$ likwid-setFrequencies -p
WARN: Intel HWP capabilities enabled. CPU and Uncore frequency changes are ignored but allowed
Current CPU frequencies:
HWThread 0: governor  performance min/cur/max 1.0/2.3/2.3 GHz Turbo 0
HWThread 1: governor  performance min/cur/max 1.0/2.299859/2.3 GHz Turbo 0
HWThread 2: governor  performance min/cur/max 1.0/2.3/2.3 GHz Turbo 0
HWThread 3: governor  performance min/cur/max 1.0/2.299859/2.3 GHz Turbo 0
HWThread 4: governor  performance min/cur/max 1.0/2.299859/2.3 GHz Turbo 0
....
Current Uncore frequencies:
Socket 0: min/max 1.0/3.7 GHz
Socket 1: min/max 1.0/3.7 GHz

# trying suggested command : reset and turbo on

$ likwid-setFrequencies -t 1 -reset
WARN: Intel HWP capabilities enabled. CPU and Uncore frequency changes are ignored but allowed
Reset to governor performance with min freq. 1.0 GHz and deactivate turbo mode

# Checking frequencies again

$ likwid-setFrequencies -p
WARN: Intel HWP capabilities enabled. CPU and Uncore frequency changes are ignored but allowed
Current CPU frequencies:
HWThread 0: governor  performance min/cur/max 1.0/2.299859/2.3 GHz Turbo 1
HWThread 1: governor  performance min/cur/max 1.0/2.299859/2.3 GHz Turbo 1
HWThread 2: governor  performance min/cur/max 1.0/2.299859/2.3 GHz Turbo 1
HWThread 3: governor  performance min/cur/max 1.0/2.299859/2.3 GHz Turbo 1
HWThread 4: governor  performance min/cur/max 1.0/2.3/2.3 GHz Turbo 1

IIUC, I was expecting max frequency of 3.7 GHz above. Then I tried following:

$ likwid-setFrequencies -t 1 -x 1.0 -y 3.7
WARN: Intel HWP capabilities enabled. CPU and Uncore frequency changes are ignored but allowed
ERROR: Setting maximal CPU frequency below base CPU frequency with activated Turbo mode is not supported.

This is bit weird right? Then I started looking at the code (v5.1.0) here https://github.com/RRZE-HPC/likwid/blob/bb074ccb66b3e7f6e5e1f24886511d2ef41bb938/src/applications/likwid-setFrequencies.lua#L467

I debugged a bit and saw:

So the max_freq is in KHz but base_freq is in Hz! And that causes above condition to fail and hence the error.

If I am not mistaken, get_base_freq() should multiply the value by 1E3 here https://github.com/RRZE-HPC/likwid/blob/bb074ccb66b3e7f6e5e1f24886511d2ef41bb938/src/applications/likwid-setFrequencies.lua#L145

With this, everything is working as expected. (I have used this in the past without any issues and hence was bit confused by earlier behaviour).

Thank you again for all help!

NOTES : In the latest master I am seeing that get_base_freq is simplified. I quickly tested master branch and don't see this problem.

TomTheBear commented 3 years ago

You are right, line 145 should be return freq*1E3. I fixed it for 5.1.1. I changed it also in the fallback function in the master branch.

The master branch reads the data from a MSR thus requires root access. I tried many things but no method was as reliable as the MSR data. There is also a CPUID leaf (0x16) for CPU frequency information but on my AMD nodes, the base frequency is always zero.

pramodk commented 3 years ago

Perfect! Thanks for an update. I will try to test the latest master in coming days. As original questions are clarified, I will close this issue.

pramodk commented 1 year ago

@TomTheBear : this is an old thread but I am redoing some of the experiments that I mentioned above. I would like to clarify the following message:

WARN: Intel HWP capabilities enabled. CPU and Uncore frequency changes are ignored but allowed
Current CPU frequencies:
...

What does this WARN message mean? If HWP is enabled then likwid-setFrequencies modifies frequencies but they are somehow ignored? Or something else?

On our system what I would like to do is to do some performance measurements for different CPU (core and/or encore) frequencies and I am checking what I really need to do in order to have reliable measurements.

TomTheBear commented 1 year ago

Intel HWP is a hardware feature which controls the various frequencies based on some internal heuristics. The feature can only be enabled and disabled by the BIOS/UEFI. We saw problems on a test system where it was activated and it limited the frequency much lower than expected. This was independent of using likwid-setFrequencies or the sysfs files directly. After we disabled it, the frequency was as expected.

So I added this warning to let the users know that there might be problems when Intel HWP is enabled. The issue is that active Intel HWP does in most cases not cause any unexpected frequencies changes, so likwid-setFrequencies should work and the frequency should be stable. Maybe I should reformulate the warning or try to check additional Intel HWP settings before printing the warning.

On our system what I would like to do is to do some performance measurements for different CPU (core and/or encore) frequencies and I am checking what I really need to do in order to have reliable measurements.

Most LIKWID performance groups measure the CPU frequency per hardware thread. You should set the frequency to the desired level, measure the frequency with likwid-perfctr -g CLOCK ... and compare the values afterwards. The clock might be lowered also by other "features" like the lower AVX(512) frequencies on Intel Skylake and later.

pramodk commented 1 year ago

thank you very much @TomTheBear for quick answer!

Understood.

I would take a liberty to ask one more question/clarification:

In the old days I was doing something like below and I have following output in my notes:

$ likwid-setFrequencies -t 0; \
for freq in 1.0 1.4 1.8 2.2 2.5; \
do \
    likwid-setFrequencies -t 0 -f $freq; \
    likwid-bench -t peakflops_avx512_fma -w S0:32KB:1 | grep MFlops/s; \
done

MFlops/s: 31451.79
MFlops/s: 43944.46
MFlops/s: 56777.03
MFlops/s: 69436.97
MFlops/s: 78913.09

and I was seeing the desired effect in the reported MFlops/s (shown above). All good.

But today I am not able to reproduce it on our compute nodes and wondering what I am missing. Might be something obvious but I am not able to recall atm 🤔. Here is what I see on the node:

kumbhar@ldir01u01:~$ cpupower frequency-info
analyzing CPU 0:
  driver: intel_pstate
  CPUs which run at the same hardware frequency: 0
  CPUs which need to have their frequency coordinated by software: 0
  maximum transition latency:  Cannot determine or is not supported.
  hardware limits: 1000 MHz - 4.00 GHz
  available cpufreq governors: performance powersave
  current policy: frequency should be within 1000 MHz and 4.00 GHz.
                  The governor "performance" may decide which speed to use
                  within this range.
  current CPU frequency: Unable to call hardware
  current CPU frequency: 3.00 GHz (asserted by call to kernel)
  boost state support:
    Supported: no
    Active: no

and

kumbhar@ldir01u01:~$ likwid-setFrequencies -p
WARN: Intel HWP capabilities enabled. CPU and Uncore frequency changes are ignored but allowed
Current CPU frequencies:
HWThread 0: governor  performance min/cur/max 1.0/2.99989/4.0 GHz Turbo 0
HWThread 1: governor  performance min/cur/max 1.0/3.000024/4.0 GHz Turbo 0
HWThread 2: governor  performance min/cur/max 1.0/2.99989/4.0 GHz Turbo 0
HWThread 3: governor  performance min/cur/max 1.0/2.99989/4.0 GHz Turbo 0
HWThread 4: governor  performance min/cur/max 1.0/2.999755/4.0 GHz Turbo 0
HWThread 5: governor  performance min/cur/max 1.0/2.999487/4.0 GHz Turbo 0
HWThread 6: governor  performance min/cur/max 1.0/2.99989/4.0 GHz Turbo 0
HWThread 7: governor  performance min/cur/max 1.0/2.999755/4.0 GHz Turbo 0
HWThread 8: governor  performance min/cur/max 1.0/2.999621/4.0 GHz Turbo 0
HWThread 9: governor  performance min/cur/max 1.0/2.999621/4.0 GHz Turbo 0
...

Current Uncore frequencies:
Socket 0: min/max 1.2/2.4 GHz
Socket 1: min/max 1.2/2.4 GHz
kumbhar@ldir01u01:~$ likwid-setFrequencies -l
WARN: Intel HWP capabilities enabled. CPU and Uncore frequency changes are ignored but allowed
Cannot get frequencies from cpufreq module
The intel_pstate module allows free selection of frequencies in the available range
Minimal CPU frequency 1.0
Maximal CPU frequency 4.0

If I run the likwid-setFrequencies and likwid-bench then I see:

kumbhar@ldir01u01:~$ likwid-setFrequencies -t 0; \
> for freq in 1.0 1.4 1.8 2.2 2.5; \
> do \
>     likwid-setFrequencies -t 0 -f $freq; \
>     likwid-bench -t peakflops_avx512_fma -w S0:32KB:1 | grep MFlops/s; \
> done
WARN: Intel HWP capabilities enabled. CPU and Uncore frequency changes are ignored but allowed
WARN: Intel HWP capabilities enabled. CPU and Uncore frequency changes are ignored but allowed
Running without Marker API. Activate Marker API with -m on commandline.
MFlops/s:       65948.04
WARN: Intel HWP capabilities enabled. CPU and Uncore frequency changes are ignored but allowed
Running without Marker API. Activate Marker API with -m on commandline.
MFlops/s:       65961.83
WARN: Intel HWP capabilities enabled. CPU and Uncore frequency changes are ignored but allowed
Running without Marker API. Activate Marker API with -m on commandline.
MFlops/s:       66007.68
WARN: Intel HWP capabilities enabled. CPU and Uncore frequency changes are ignored but allowed
Running without Marker API. Activate Marker API with -m on commandline.
MFlops/s:       66027.81
WARN: Intel HWP capabilities enabled. CPU and Uncore frequency changes are ignored but allowed
Running without Marker API. Activate Marker API with -m on commandline.
MFlops/s:       66007.65

here MFlops/s remain same.

Looking at likwid-setFrequencies I see:

kumbhar@ldir01u01:~$ likwid-setFrequencies -t 0 -f 2.6
WARN: Intel HWP capabilities enabled. CPU and Uncore frequency changes are ignored but allowed
kumbhar@ldir01u01:~$ likwid-setFrequencies -p
WARN: Intel HWP capabilities enabled. CPU and Uncore frequency changes are ignored but allowed
Current CPU frequencies:
HWThread 0: governor  performance min/cur/max 1.0/3.000024/4.0 GHz Turbo 0
HWThread 1: governor  performance min/cur/max 1.0/3.000292/4.0 GHz Turbo 0
HWThread 2: governor  performance min/cur/max 1.0/2.99989/4.0 GHz Turbo 0
...

So the frequency is not set correctly? And if so, the reason for this could be ... ?

pramodk commented 1 year ago

Just to add, I am attaching the output of command likwid-setFrequencies -f 1.0 -V 3 here: likwid_debug.txt. I am not sure if its relevant but I am seeing Failed to open file:

DEBUG: Given CPU expression expands to 2 CPU sockets:
DEBUG: 0,1
DEBUG - [freq_send_direct:471] CMD READ CPU 0 FREQ_LOC_AVAIL_FREQ FD -1
DEBUG - [open_cpu_file:181] "Failed to open file /sys/devices/system/cpu/cpu0/cpufreq/scaling_available_frequencies \n"
DEBUG - [freq_send_direct:476] CMD READ CPU 0 FREQ_LOC_CONF_MIN FD -1
DEBUG - [freq_send_direct:481] CMD READ CPU 0 FREQ_LOC_CONF_MAX FD -1
DEBUG Available freq.: 1.0, 4.0
DEBUG -
[likwid_debug.txt](https://github.com/RRZE-HPC/likwid/files/10196613/likwid_debug.txt)
 [freq_send_direct:457] CMD READ CPU 0 FREQ_LOC_MAX FD 5
DEBUG - [freq_send_direct:453] CMD READ CPU 0 FREQ_LOC_MIN FD 6
DEBUG - [freq_send_client:554] DAEMON CMD WRITE CPU 0 LOC 0

Edit: I haven't debugged this in details but I just want to say that my old installation v5.1.0 with minor fix discussed in https://github.com/RRZE-HPC/likwid/issues/386#issuecomment-807703655 is working as expected:

$ likwid-setFrequencies --version
likwid-setFrequencies -- Version 5.1.0 (commit: 233ab943543480cd46058b34616c174198ba0459)

And newer installation v5.2.0 isn't working/exhibits the behavior shown in previous comment:

$ likwid-setFrequencies --version
likwid-setFrequencies -- Version 5.2.0 (commit: 233ab943543480cd46058b34616c174198ba0459)

When I have a bit more time, I can take a detailed look at what's going on.

TomTheBear commented 1 year ago

My guess would be these lines:

DEBUG - [freq_client_startDaemon:324] Starting daemon /gpfs/bbp.cscs.ch/project/proj16/software/bb5/install/linux-rhel7-x86_64/gcc-9.3.0/likwid-5.2.0-yaetci/sbin/likwid-setFreq
DEBUG - [freq_client_startDaemon:384] Successfully opened socket /tmp/likwid-freq-6508 to daemon
DEBUG - [HPMinit:98] Adjusting functions for x86 architecture in daemon mode
DEBUG - [freq_finalize_client:587] DAEMON CMD CLOSE

The frequency daemon, required to set the frequency, is started but then the client sends the closing signal directly.

The failed to open file is not a problem, the intel_pstate driver simply does not provide the scaling_available_frequencies file because it allows you to freely choose between min. and max. frequency.

I saw also on one of our nodes that the frequency manipulation has no effect, so I will investigate as well. If you find anything, let me know.