Alexays / Waybar

Highly customizable Wayland bar for Sway and Wlroots based compositors. :v: :tada:
MIT License
6.75k stars 708 forks source link

Sometimes, under complex microarchitectural conditions, Waybar gets into a state where it consumes 100% of a CPU core until I restart it #3773

Open meithecatte opened 1 week ago

meithecatte commented 1 week ago

I have decided to actually try to debug it this time. I have it SIGSTOP'd awaiting further analysis. I have collected a perf trace and it shows that the CPU time is being spent in libpulsecommon-17.0.so's pa_read function.

Here is the relevant part of the configuration file:

    "pulseaudio": {
        // "scroll-step": 1, // %, can be a float
        "format": "{volume}% {icon} {format_source}",
        "format-bluetooth": "{volume}% {icon} {format_source}",
        "format-bluetooth-muted": " {icon} {format_source}",
        "format-muted": " {format_source}",
        "format-source": "{volume}% ",
        "format-source-muted": "",
        "format-icons": {
            "headphone": "",
            "hands-free": "",
            "headset": "",
            "phone": "",
            "portable": "",
            "car": "",
            "default": ["", "", ""]
        },
        "on-click": "pavucontrol"
    },

Possibly relevant versions:

~$ pacman -Q waybar libpulse pipewire-pulse pipewire
waybar 0.11.0-3
libpulse 17.0-3
pipewire-pulse 1:1.2.6-1
pipewire 1:1.2.6-1

Let me know how you'd like me to poke the process to try and debug this further.

meithecatte commented 5 days ago

I think this often happens when I try to scroll at the thing to change the volume. Either that's the cause, or that's how I notice because the weird state causes the scrolling not to work.

When it happens, the backtrace on the busy thread is this:

#0  0x0000755824e2bc5a in read () from /usr/lib/libc.so.6
#1  0x00007558243e34c5 in pa_read () from /usr/lib/pulseaudio/libpulsecommon-17.0.so
#2  0x000075582551c2a7 in pa_mainloop_prepare () from /usr/lib/libpulse.so.0
#3  0x000075582551c60d in pa_mainloop_iterate () from /usr/lib/libpulse.so.0
#4  0x000075582551c6d1 in pa_mainloop_run () from /usr/lib/libpulse.so.0
#5  0x000075582552cbf2 in ?? () from /usr/lib/libpulse.so.0
#6  0x000075582441b2b7 in ?? () from /usr/lib/pulseaudio/libpulsecommon-17.0.so
#7  0x0000755824db439d in ?? () from /usr/lib/libc.so.6
#8  0x0000755824e3949c in ?? () from /usr/lib/libc.so.6

The calls to pa_mainloop_iterate do terminate properly.

meithecatte commented 5 days ago

It seems that in pa_mainloop_poll, m->n_enabled_defer_events gets stuck at 1, which prevents the function from actually blocking and waiting for events. In normal operation, m->n_enabled_defer_events resets to zero after one call, allowing the events to be pumped.

meithecatte commented 5 days ago

It would seem that this is what would happen if PulseAudio's locking requirements were not upheld. I'm not familiar enough with the codebase to tell for sure, but it seems like AudioBackend::changeVolume does not properly lock the PulseAudio mainloop's mutex, while not being called as a callback.

I am currently considering building waybar with thread sanitizer to check this empirically, but I don't yet know a good way to handle all the dependencies.