GalliumOS / galliumos-distro

Docs, issues, and artwork sources for GalliumOS
https://galliumos.org/
GNU General Public License v2.0
348 stars 11 forks source link

xfce4-power-manager CPU usage spike #492

Open off-by-one opened 5 years ago

off-by-one commented 5 years ago

CPU% of xfce4-power-manager spikes to 10-100% according to top. According to strace it is reading the battery status a few thousand times a second. This problem is exacerbated by suspending, but will also occur during normal use from startup.

open("/sys/class/power_supply/BAT0/status", O_RDONLY) = 14
read(14, "D", 1)                        = 1
close(14)                               = 0

Killing and restarting the process temporarily fixes the issue, but then syscall rate begins to increase again.

off-by-one commented 5 years ago

Some more metrics:

The relevant top line

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                                        
 1240 <username>      20   0  385980  23864  16116 S  20.5  0.6  12:13.23 xfce4-power-man                                                                                                                                

At this point it was measuring around 20% utilization.

At this time over a 10 second period the process made 38,000 calls to each of open, read, and close, all of the same sysfs battery status file.

xfce4-power-manager --debug doesn't appear to have anything useful - it is entirely normal for a few seconds, and then begins repeatedly updating the battery icon

TRACE[xfpm-battery.c:193] xfpm_battery_refresh_icon(): Battery state 2
TRACE[xfpm-battery.c:244] xfpm_battery_refresh_icon(): Battery icon xfpm-primary-100

With multiple calls being made at regular intervals.

off-by-one commented 5 years ago

I think I have found a potential culprit - xfce4-power-manager is behaving as expected, which is being triggered every time upower signals a power device changed.

However, upower is sending signals frequently and multiple times per piece of information received.

<name>@<server>:~/Development/xfce4-power-manager$ upower -m
Monitoring activity from the power daemon. Press Ctrl+C to cancel.
[13:44:05.668]  device changed:     /org/freedesktop/UPower/devices/battery_BAT0
[13:44:05.668]  device changed:     /org/freedesktop/UPower/devices/battery_BAT0
[13:44:05.668]  device changed:     /org/freedesktop/UPower/devices/battery_BAT0
[13:44:05.668]  device changed:     /org/freedesktop/UPower/devices/battery_BAT0
[13:44:07.169]  device changed:     /org/freedesktop/UPower/devices/battery_BAT0
[13:44:07.169]  device changed:     /org/freedesktop/UPower/devices/battery_BAT0
[13:44:07.169]  device changed:     /org/freedesktop/UPower/devices/battery_BAT0
[13:44:07.169]  device changed:     /org/freedesktop/UPower/devices/battery_BAT0
[13:44:08.671]  device changed:     /org/freedesktop/UPower/devices/battery_BAT0
[13:44:08.671]  device changed:     /org/freedesktop/UPower/devices/battery_BAT0
[13:44:08.671]  device changed:     /org/freedesktop/UPower/devices/battery_BAT0
[13:44:08.671]  device changed:     /org/freedesktop/UPower/devices/battery_BAT0

However, UPower does not have this behavior when the power manager is off - it is triggered when the power manager is on for a few minutes.

lrnzo commented 5 years ago

on a thinkpad x230I had this problem too. As soon as I remove the battery and use powercable instead, the cpu usage of upowerd is back to normal

Ferrantisimo commented 5 years ago

I have this issue (hammer opening /sys/class/power_supply/BAT0/status) on my Acer Chromebook 14 EDGAR / Braswell. Max CPU usage I've observed is probably 5-10%.

My observations are similar to others above. xfce4-power-manager will behave after reboot, but slowly and incrementally increase its CPU utilization the longer the system runs. Restarting it with xfce4-power-manager --restart will reset the CPU utilization before it begins creeping up again. On my system, there are actually two instances of xfce4-power-manager started when logging in after a cold boot, one with a session ID (e.g. xfce4-power-manager --restart --sm-client-id 982048394-293b-3424-2394-198987523483) and a second instance with a slightly higher PID and no command line options. In these cases, it is the second instance that will exhibit this CPU-consuming behavior. The first instance seems to not be providing any power management services (i.e. visiting XFCE Settings > Power Manager will prompt to launch xfce4-power-manager), so the second instance is both the source of power management services as well as this CPU-consumption issue.

Regarding what is going on with xfce4-power-manager itself, strace reveals a few things:

  1. The CPU consumption is from repeated open/read/close calls to /sys/class/power_supply/BAT0/status (about 2k/second maximum, on my hardware)
  2. The frequency of the open/read/close calls likely comes from a poll() call which appears to be blocking in a loop with the open/read/close battery calls. After any power-related event, the polling time for this poll() call will block for shorter and shorter intervals with repeated open/read/close calls in between.
  3. Restarting xfce4-power-manager will reset this short polling interval and thereby reduce the CPU usage

You can observe this process on any affected system by running strace -tt xfce4-power-manager --no-daemon --restart. Power management events (e.g. unplugging and plugging in the system) will increase the frequency of the open/read/close calls.

Here's a sample strace where you can see the initial 60-second poll() interval, followed a couple of minutes later by a much shorter poll() interval and many more open/read/close calls per second: (fd3 here is the X11 local socket /tmp/.X11-unix/X0, and fd4 is an eventfd object that I suspect is used to coordinate event handling between parent/child instances of xfce4-power-manager)

13:22:37.280812 poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=6, events=POLLIN|POLLPRI}, {fd=7, events=POLLIN}, {fd=9, events=POLLIN}], 6, 59808) = 1 ([{fd=4, revents=POLLIN}])
13:22:47.876784 recvmsg(3, 0x7ffffb5bf010, 0) = -1 EAGAIN (Resource temporarily unavailable)
13:22:47.877127 poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=6, events=POLLIN|POLLPRI}, {fd=7, events=POLLIN}, {fd=9, events=POLLIN}], 6, 1500) = 1 ([{fd=4, revents=POLLIN}])
13:22:47.877426 read(4, "\2\0\0\0\0\0\0\0", 16) = 8
13:22:47.877540 recvmsg(3, 0x7ffffb5bf010, 0) = -1 EAGAIN (Resource temporarily unavailable)
13:22:47.877649 poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=6, events=POLLIN|POLLPRI}, {fd=7, events=POLLIN}, {fd=9, events=POLLIN}], 6, 1499) = 0 (Timeout)
13:22:49.379058 read(4, 0x7ffffb5bf1c0, 16) = -1 EAGAIN (Resource temporarily unavailable)
13:22:49.379451 open("/sys/class/power_supply/BAT0/status", O_RDONLY) = 14
13:22:49.379946 read(14, "C", 1)        = 1
13:22:49.404991 close(14)               = 0
13:22:49.405241 open("/sys/class/power_supply/BAT0/status", O_RDONLY) = 14
13:22:49.405522 read(14, "C", 1)        = 1
13:22:49.405726 close(14)               = 0
13:22:49.405979 recvmsg(3, 0x7ffffb5bf010, 0) = -1 EAGAIN (Resource temporarily unavailable)
13:22:49.406214 poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=6, events=POLLIN|POLLPRI}, {fd=7, events=POLLIN}, {fd=9, events=POLLIN}], 6, 1474) = 0 (Timeout)

*SNIP*

13:25:06.561689 poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=6, events=POLLIN|POLLPRI}, {fd=7, events=POLLIN}, {fd=9, events=POLLIN}], 6, 103) = 0 (Timeout)
13:25:06.667027 open("/sys/class/power_supply/BAT0/status", O_RDONLY) = 14
13:25:06.667444 read(14, "C", 1)        = 1
13:25:06.667709 close(14)               = 0
13:25:06.667979 open("/sys/class/power_supply/BAT0/status", O_RDONLY) = 14
13:25:06.668276 read(14, "C", 1)        = 1
13:25:06.668513 close(14)               = 0
13:25:06.668741 open("/sys/class/power_supply/BAT0/status", O_RDONLY) = 14
13:25:06.669092 read(14, "C", 1)        = 1
13:25:06.669300 close(14)               = 0
13:25:06.669463 open("/sys/class/power_supply/BAT0/status", O_RDONLY) = 14
13:25:06.669801 read(14, "C", 1)        = 1
13:25:06.670036 close(14)               = 0
13:25:06.670257 open("/sys/class/power_supply/BAT0/status", O_RDONLY) = 14
13:25:06.670550 read(14, "C", 1)        = 1
13:25:06.670828 close(14)               = 0
13:25:06.671055 open("/sys/class/power_supply/BAT0/status", O_RDONLY) = 14
13:25:06.671346 read(14, "C", 1)        = 1
13:25:06.671586 close(14)               = 0
13:25:06.671854 open("/sys/class/power_supply/BAT0/status", O_RDONLY) = 14
13:25:06.672142 read(14, "C", 1)        = 1
13:25:06.672378 close(14)               = 0
13:25:06.672616 recvmsg(3, 0x7ffc33debd80, 0) = -1 EAGAIN (Resource temporarily unavailable)
13:25:06.672939 poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=6, events=POLLIN|POLLPRI}, {fd=7, events=POLLIN}, {fd=9, events=POLLIN}], 6, 23) = 0 (Timeout)
13:25:06.696823 open("/sys/class/power_supply/BAT0/status", O_RDONLY) = 14
13:25:06.697284 read(14, "C", 1)        = 1
13:25:06.697589 close(14)               = 0
13:25:06.697912 open("/sys/class/power_supply/BAT0/status", O_RDONLY) = 14
13:25:06.698261 read(14, "C", 1)        = 1
13:25:06.698548 close(14)               = 0
13:25:06.698856 open("/sys/class/power_supply/BAT0/status", O_RDONLY) = 14
13:25:06.699202 read(14, "C", 1)        = 1
13:25:06.699479 close(14)               = 0
13:25:06.699739 open("/sys/class/power_supply/BAT0/status", O_RDONLY) = 14
13:25:06.700127 read(14, "C", 1)        = 1
13:25:06.700404 close(14)               = 0
13:25:06.700657 open("/sys/class/power_supply/BAT0/status", O_RDONLY) = 14
13:25:06.701034 read(14, "C", 1)        = 1
13:25:06.701308 close(14)               = 0
13:25:06.701562 open("/sys/class/power_supply/BAT0/status", O_RDONLY) = 14
13:25:06.701913 read(14, "C", 1)        = 1
13:25:06.702151 close(14)               = 0
13:25:06.702394 recvmsg(3, 0x7ffc33debd80, 0) = -1 EAGAIN (Resource temporarily unavailable)
13:25:06.702707 poll([{fd=3, events=POLLIN}, {fd=4, events=POLLIN}, {fd=5, events=POLLIN}, {fd=6, events=POLLIN|POLLPRI}, {fd=7, events=POLLIN}, {fd=9, events=POLLIN}], 6, 3) = 0 (Timeout)

I have a desktop (i.e. no battery) system that runs XFCE 1.4.4 (Slackware 14.2), and xfce4-power-manager there will poll() initially for 60 seconds, then poll at -1 (forever) afterward while waiting for power events to handle. It does not exhibit the CPU-consuming behaviors described above.

I didn't see this rapid open/read/close issue in any of the xfce4-power-manager bugs but I just looked briefly. The last commit for 1.4.4 was 4 years ago, so perhaps this will all fall away if/when Gallium ships an updated XFCE. If anyone has gdb and can build a debugging-enabled xfce4-power-manager, letting it run for a bit and setting a breakpoint on the poll() or open() calls will probably identify the culprit code pretty quickly. I am diskspace limited on my Gallium machine and can't really install a full set of GTK2+ development files (+a bazillion dependencies) to catch this bug.

Finally, upower isn't generating events fast enough to feed this crazy behavior from xfce4-power-manager and upower may only be "responsible" to the degree that xfce4-power-manager will poll() faster and faster with each power management event. However, it is interesting to note that upower detects a battery "device changed" event about every 3 minutes, which feeds the loop in xfce4-power-manager. Additionally, the event that triggers the short poll() intervals is this battery change event which is noted by upower. You can see this in the 60sec poll() event that terminates with an event on fd4 ([{fd=4, revents=POLLIN}]), which occurs simultaneously with a device changed: /org/freedesktop/UPower/devices/battery_BAT0 from upower -m.