bbidulock / icewm

A window manager designed for speed, usability, and consistency
Other
577 stars 98 forks source link

Feature/ignore thermal zones #634

Closed alspitz closed 2 years ago

alspitz commented 2 years ago

Some thermal zones' temperatures take a long time to read (e.g. 0.5 seconds). When ACPI temperatures are displayed in the CPU graph, these temperatures are polled repeatedly, and any blockage, blocks the entire window manager. This change adds an option, IgnoreThermalZones, which allows any such offending thermal zones to be skipped and not slow down the window manager.

Also, I changed the temperature display in the CPU graph to display all (non-ignored) ACPI temps. Let me know if you think this should be adaptive based on the width of the CPU monitor (with a low-width CPU monitor, they won't fit), or configurable in some way.

Code7R commented 2 years ago

If the blocking is troubling you, wouldn't it be the better choice to add a timeout option to filereader::read_all, and cache the last value in-between?

All that assuming that reading this file can be performed without blocking after some time. Could you test this please?

gijsbers commented 2 years ago

Can it be done without introducing a new option?

Which files take a long time to read? Can you do:

for f in /sys/class/thermal/*/temp; do time grep '' $f; done

apppstatus.cc avoids reopening the same file and uses lseek. Would that help?

The question is if the read takes so much time, or the opening and closing of the file.

Could you post images of your extended graph?

alspitz commented 2 years ago

Using lseek doesn't seem to speed up the read. Running this Python code:

import os, random, time
import numpy as np
p = "/sys/class/thermal"
dat = { fn : (open(os.path.join(p, fn, "temp")),
              open(os.path.join(p, fn, "type")).read().strip(),
              [])
        for fn in os.listdir(p) if "thermal" in fn}
for i in range(50):
  for fn in random.sample(list(dat.keys()), len(dat)):
    fd = dat[fn][0].fileno()
    os.lseek(fd, 0, os.SEEK_SET)

    st = time.time()
    temp = os.read(fd, 10)
    dt = time.time() - st

    dat[fn][2].append(dt)

for fn in sorted(dat.keys()):
  dts = dat[fn][2]
  print("%s (%s)\n\tmean: %f +- %f, max: %f" % (fn, dat[fn][1], np.mean(dts), np.std(dts), np.max(dts)))

results in

thermal_zone0 (acpitz)
        mean: 0.001641 +- 0.000296, max: 0.002463
thermal_zone1 (pch_skylake)
        mean: 0.000021 +- 0.000007, max: 0.000034
thermal_zone2 (iwlwifi_1)
        mean: 0.146807 +- 0.088839, max: 0.205067
thermal_zone3 (x86_pkg_temp)
        mean: 0.000037 +- 0.000024, max: 0.000127

which is very similar to what happens if the file is open and closed each time.

I'm not sure how to not display iwlwifi_1 without a new option without adding a lot of complexity, especially since sometimes, the read happens quickly. If there is a read timeout, what should the timeout be? Any block will still slow down the window manager, so I'd rather not add a timeout. The ignore option is nice because I don't particularly care about the wifi card temperature since it doesn't trend with CPU usage as the CPU temp does. The ignore option also seems consistent with the ignore batteries option. Each system will potentially have a different thermal zone that may possibly be slow to read, so having to configure it seems okay. The only issue I see is that someone may enable the temperature in the graph option and not realize that their window manager is stuttering due to the slow read and thus never add the appropriate ignore option. An automatic solution may be something where if any temp read ever takes more than some set time (dependent on refresh rate?), that thermal zone is excluded from the thermal zones that are read, with perhaps some way to get back into the "good" thermal zones club with good behavior. Or some asynchronous read would be ideal, but not sure how well it can be added to the current architecture. But at least the current pull request can only help the situation, at the cost of an extra option.

Here are some images of the CPU graph I use, with a width of 300 samples: cpu-graph cpu-graph2 and the default width of 20: cpu-graph3 Currently, it has 3 temperatures, because I'm ignoring iwlwifi_1. Incidentally, it doesn't look too bad with the default width of 20 samples, but only one temp is shown, and it's effectively chosen at random (according to the filesystem directory listing).

gijsbers commented 2 years ago

Very good! You can simply always ignore anything containing wifi or starts with iw. We only care about CPU temperature. The first examination should check their type and give priority to x86_pkg_temp, or else ^pch. The acpitz is less interesting. Always reading all those files is wasteful. One or two should suffice.

alspitz commented 2 years ago

It's definitely possible to exclude based on wifi or only include those matching x86_pkg_temp or ^pch, but I'm worried that would exclude non-Intel systems. A small number of other net drivers also seems to register a thermal zone and I imagine e.g. ARM-based systems won't have x86_pkg_temp. The additional option would allow anyone to make it work for them regardless of their system without changing the code.

Some options are:

  1. Hardcode ignoring of wifi and potentially prioritize x86 or pch for the temperature in the graph (and tooltip as well? Could show all in tooltip and only one in graph).
  2. Use ignore option to ignore bad ones and use all by default (as in this PR).
  3. Use whitelist option (so I would put x86_pkg_temp, acpitz, and pch_skylake in my preferences) and use all by default.
  4. Use libsensors for this. Detecting the "CPU temperature" reliably seems difficult given all systems out there. E.g. htop does this : https://github.com/htop-dev/htop/blob/main/linux/LibSensors.c#L144 . Would probably be best to make it an optional dependency. As an aside /sys/class/hwmon appears to be the "new" way to do this and gives more temperatures, with labels, and other info. E.g. psutil uses this: https://github.com/giampaolo/psutil/blob/master/psutil/_pslinux.py#L1316
  5. Combine 1 with 2 or 3, allowing the hardcoded priorities / ignores to be overridden using an option (keeps somewhat sane default behavior but still customizable).

What do you think?

By "first examination" did you mean to run this check once on startup (and just read temp later) or each time the temperature is needed? Reading the files should ideally be fast (well iwlwifi excluded), since they are not in the filesystem. In any case, it can all probably be done using open once, save FD, and lseek, instead of open each time.

gijsbers commented 2 years ago

We only care about the CPU type we can test now. We don't need or want any option for this ever. Most users never use this feature. The ones that do have only a default size graph. Take this order: x86_pkgtemp, ^pch, acpitz. Ignore any other module. Ignore other software like libsensors, because that creates dependencies and this is a minor feature.

An alternative is the hwmon coretemp module.

grep '' /sys/class/hwmon/*/*

That gives a temperature for the package, plus per core. There can be more than one package, but rarely so. In theory, there can be many cores. Maybe read the temperature at a slower rate.

alspitz commented 2 years ago

Hardcoding these will be less maintainable than making it customizable via an option, imo. The CPU and net graphs in the taskbar is one of the main reasons I use IceWM and having it be customizable is important to me. But I realize most users don't use them in the way I do.

Any particular reason adding an option is not recommended?

In any case, the slowdown due to iwlwifi affects all users using temp in graph regardless of its width, so I've pushed #635 Please let me know what you think.

Thanks for considering!

gijsbers commented 2 years ago

Don't change the string for the centigrade symbol, because then all the translations fail.

The temp buffer may not be initialized when it immediately returns.

You can customize to infinity. That is called programming. But it must also be maintainable forever. Optimize the ratio of (the number of users the time they use it) / ((the amount of code + documentation10) * the period to maintain it).

By avoiding new options we have to be smarter and relieve the user of all this boring configuration and documentation of teeny weeny details.

If TaskBarCPUSamples<=20 we use just one value. The one for x86_pkgtemp, if it exists, otherwise ^pch, otherwise acpitz. If TaskBarCPUSamples > 20 you can add more values, as long as they fit.

alspitz commented 2 years ago

The thermal zones are now sorted by priority, but it's not the cleanest.

Do you mean the temp in temperature? It shouldn't be used if the function returns since it's a local var.

gijsbers commented 2 years ago
    char temp[50];
    int len = getAcpiTemp(temp, sizeof(temp));

Here getAcpiTemp is not guaranteed to init temp?

We only want to read defined tzs, since any other may be a device like iwlwifi and hence be much slower. Hence only accept strictly positive prios > 0. We don't need slow strstr, but just consider prefixes.

char tzPriority(const char* tz_type) {
    const char* tzs[] = { "x86_pkg_temp", "pch_", "acpitz" };
    for (int i = 0; i < 3; ++i) {
        if (strncmp(tz_type, tzs[i], strlen(tzs[i])) == 0)
            return 4 - i;
    }
    return 0;
}

Are these priorities read just once on startup? That would be justified.

What bothers me is that much more is read than is used. Could we read just what is actually going to be shown?

You may consider reading the temperature at a slower rate, like once every two seconds. Would be efficient.

gijsbers commented 2 years ago

A design where this is encapsulated could look like:

class YTemp : private YTimerListener {
    bool handleTimer(YTimer *timer) {
        flush_data();
        return true;
    }
    char buf[24];
public:
    YTemp() {
        read_types();
        timer->setTimer(2000, this, true);
    }
    char* operator[](int index) {
        if (data[index] == invalid)
             read_data(index);
        snprintf(buf, 24, "%f %s", data[index], centigrade);
        return buf;
    }
}

Then data for each thermometer would be read at most once per two seconds, and only if it wasn't yet cached.

alspitz commented 2 years ago
    char temp[50];
    int len = getAcpiTemp(temp, sizeof(temp));

Here getAcpiTemp is not guaranteed to init temp?

I see, okay to be safe (e.g. if drawChars accesses temp even if len is zero) I am now returning early if len is zero.

We only want to read defined tzs, since any other may be a device like iwlwifi and hence be much slower. Hence only accept strictly positive prios > 0. We don't need slow strstr, but just consider prefixes.

char tzPriority(const char* tz_type) {
    const char* tzs[] = { "x86_pkg_temp", "pch_", "acpitz" };
    for (int i = 0; i < 3; ++i) {
        if (strncmp(tz_type, tzs[i], strlen(tzs[i])) == 0)
            return 4 - i;
    }
    return 0;
}

I like the loop and strncmp implementation; using that now. But I don't like that idea of hardcoding a whitelist. Currently, the hardcoded entries are only for prioritization and the exclusion of iwlwifi. If we only allow hardcoded entries, users would not be able to read any other thermal zones without changing the code. But yes, if someone has a thermal zone that is slow to read that is not iwlwifi, they would see the slowdown. This is why I think having an option is the best idea and it's in line with other options controlling network interfaces and batteries' status displays in the taskbar.

Are these priorities read just once on startup? That would be justified.

What bothers me is that much more is read than is used. Could we read just what is actually going to be shown?

You may consider reading the temperature at a slower rate, like once every two seconds. Would be efficient.

Yes maybe right now it's not the most efficient. Every TaskBarCPUDelay milliseconds, all thermal zone types are read, and any nonnegative priority thermal zones that fit into the CPU graph have their temps read.

I'm not sure about once on startup. Is it not possible for a thermal zone to appear after startup? I also don't like hardcoding an update rate without having an option to configure it. Currently, it's updated at the same rate the CPU graph is updated (default 500 ms).

gijsbers commented 2 years ago

If we only allow hardcoded entries, users would not be able to read any other thermal zones without changing the code.

I explained before why adding an option is worse. I checked the entries and they suffice. If another ever pops up then someone can notice us and we'll add it to a new release. We don't care about whole new packages of CPUs appearing. That is so exceptional and only to be expected on exotic servers. Otherwise users can still icewm -r. This is a minor minor feature, so keep it low-profile. Move everything temperate to a separate class YTemp. That gives much more modular expressivity. Replace the special case of tzPriority ignored devices from -1 to 0. I.e., then we only consider non-zero priority devices and we get rid of the signed char.

gijsbers commented 2 years ago

Commit 2a70667 implements my current thinking about this issue.