brndnmtthws / conky

Light-weight system monitor for X, Wayland (sort of), and other things, too
https://conky.cc
GNU General Public License v3.0
7.17k stars 615 forks source link

[Bug]: Querying windows every event causes significant CPU usage #1852

Closed Caellian closed 4 months ago

Caellian commented 5 months ago

With fluxbox as WM, a right click on the root window leads to 3 second delay waiting for the WM right-click menu to appear, with 100% CPU in the meantime. I bisected the problem to f4b3229fc34e1acc9a2e814abee1d886049a4560

Originally posted by @jborme in https://github.com/brndnmtthws/conky/issues/1821#issuecomment-2068056821

Workaround: disable BUILD_XINPUT flag.

lineage-of-roots commented 5 months ago

I tried on Fluxbox. I can confirm that there is a significant delay on the hover events. And Conky's cpu usage jumps really high as well.
The Fluxbox menu delay is not a problem by itself. It seems its the accumulated hover events before the click that are causing the problem. By leaving the mouse cursor on Conky, right clicking and pressing escape to remove the menu...and repeat, I saw that the menu reappears without delay. The menu has no delay if you wait about 3 seconds before right clicking. I will try later on other wms

  return propagate_xinput_event(
        reinterpret_cast<const conky::xi_event_data *>(cookie));

Seems like this is where it starts going wrong.

EDIT: Nope...The delay happens before that. I bypassed it. The delay and cpu usage jump was still there.

Caellian commented 5 months ago

Hmm... Most work happens while trying to figure out which window to send the event to.

Which also, if that's the problematic part, won't be fixed by disabling BUILD_XINPUT.

Mouse click is the simplest event to handle as well, so it doesn't make sense for it to be held up there.

I won't be able to fix this until later this week though due to midterms.

Personal note while fixing this: Added conky atoms for tweaking XInout behavior might not be needed in most cases - normal windows get correct relative/absolute values and only other types (myb root as well) seem to always get absolute values on Open box (check other WMs, might be an Openbox thing).

lineage-of-roots commented 5 months ago

Update: So I tested across many Window Managers and a couple of Desktop Environments There is a significant jump in cpu usage when hovering on conky. It's consistent across all WMs and DEs, except one outlier that I found.

Conky also makes Xorg jump really high in cpu usage. They both jump together.

The outlier I found was jwm. It does not seem to care about the hover events. EDIT: The cpu usage does jump, but not as high. The menu appears normally as well. I don't yet know what it's doing different. Might be a clue.

Fluxbox seems like the only one with more delay in the hover events, therefore making it look like the menu is delayed.

I testsed on MATE and KDE Plasma, same high cpu usage on hover events. I am guessing other DEs like Gnome and Cinnamon will also see jumps in cpu usage.

There's some back and forth going on between Conky and Xorg. Could it be that the hover event conky is sending to root window is being sent back to conky?

I will jump in the code later to find out what's causing the sudden high cpu usage in Conky and Xorg

lineage-of-roots commented 5 months ago

I just noticed something peculiar

The hover doesn't have to be on Conky...I can move the mouse anywhere else, Conky and Xorg will still jump high in cpu usage. As long as I am moving the mouse, the cpu usage remains high.

Jumps even higher when the hover is on Conky itself.

This doesn't happen when Conky is not running.

This behavior is not limited to Fluxbox. Happens elsewhere as well.

EDIT: This could be a more significant issue depending on the hardware, such as the person who originally posted about this.

Caellian commented 5 months ago

Could it be that the hover event conky is sending to root window is being sent back to conky?

No, it's because conky is making a lot of X11 requests to query the window underneath it every pixel it moves.

The hover doesn't have to be on Conky

Also why. It's tracking events globally.

Caellian commented 5 months ago

We could cache window AABBs in order to reduce the number of calls and processing needed. I just need to figure out how we can listen to changes to _NET_CLIENT_LIST.

I maybe focused too much with correctness because lack of _NET_CLIENT_LIST (on WMs that don't support EWMH (dwl, maybe some others)) causes the code to fall back to window tree traversal which is much much worse.

I think I'll feature gate this fallback with an off-by-default compile flag in case someone has a beefy computer and doesn't mind, and otherwise fallback to old behavior of using window.desktop in case _NET_CLIENT_LIST doesn't exist.

I can probably update AABB cache using MapNotify, UnmapNotify, VisibilityNotify and ConfigureNotify events. Nice thing about having that cache in conky is that window variables can also use it.

jborme commented 5 months ago

@Caellian

Hmm... Most work happens while trying to figure out which window to send the event to. Which also, if that's the problematic part, won't be fixed by disabling BUILD_XINPUT.

The problem I reported is fixed at my place by disabling BUILD_XINPUT. :+1:

Caellian commented 5 months ago

The problem I reported is fixed at my place by disabling BUILD_XINPUT. 👍

Which version are you on?

lineage-of-roots commented 5 months ago

I was too caught up in the previous Conky behavior with hovering over Conky. I hadn't realized this was new.

I have to agree. This new behavior is too resource intensive. If someone is working on say GIMP or any other application that requires alot of mouse movements, they might not appreciate Conky taking up so much of the resources.

Wouldn't it be better just to deal with mouse events ONLY if they happen on the Conky window?

The only usual/normal use case is DE/WM menus...Other use cases will have to be more convoluted. The Previous behavior was sufficient for that. The only problem AFAIK was MATE Desktop Environment not showing the menu... which can be sacrificed lol.

I maybe focused too much with correctness because lack of _NET_CLIENT_LIST (on WMs that don't support EWMH (dwl, maybe some others)) causes the code to fall back to window tree traversal which is much much worse.

I think I'll feature gate this fallback with an off-by-default compile flag in case someone has a beefy computer and doesn't mind, and otherwise fallback to old behavior of using window.desktop in case _NET_CLIENT_LIST doesn't exist.

They would have to be really obscure WMs...And as far as Wayland goes, I know next to nothing about it. I don't use it or its window managers. I played around with Hyprland a bit though. All I know is that there's still too many issues with Wayland.



I chased JWM down the rabbit hole for a bit and discovered a couple of different behaviors. On JWM the cpu usage of Conky and Xorg do not go as high as as all the others. They definitely jump up, but don't hit the same peaks of the others The second thing is that when hovering over Conky in JWM, ` propagate_x11_event(ev, cookie); ` does not get executed in the function ` void process_surface_events(conky::display_output_x11 *surface, Display *display) ` "Consume" becomes true somewhere in ` bool handled = process_event(surface, display, ev, &consumed, &cookie); ` That's how it happens mostly, at least. The unusual way to make ` propagate_x11_event(ev, cookie); ` execute in JWM is to bring up the JWM menu on top of Conky, then hover on and off the menu. There will be an event for hovering on and off the Menu. This is strange because that would mean conky thinks it's both above and below the menu.
jborme commented 5 months ago

Which version are you on?

I tested BUILD_XINPUT with 1.19.7 and 1.20.1, in both cases the compile option solves the issue.

Caellian commented 5 months ago

Wouldn't it be better just to deal with mouse events ONLY if they happen on the Conky window?

Tried that initially and it doesn't work for some window types - own_window_type=desktop can't capture clicks, own_window=false can't capture movement, window enter/leave events don't get reported properly if the cursor moves from a window to conky (only desktop<->conky worked).

And as far as Wayland goes, I know next to nothing about it.

This is an X11 bug/thing. Wayland worked flawlessly across all compositors I tested, with only 20 lines of dead simple code....

The unusual way to make propagate_x11_event(ev, cookie); execute in JWM is to bring up the JWM menu on top of Conky, then hover on and off the menu. There will be an event for hovering on and off the Menu.

Right... that's a separate bug then.. I have to examine the window tree for both JWM and fluxbox.

I tested BUILD_XINPUT with 1.19.7 and 1.20.1, in both cases the compile option solves the issue.

Thanks. I might produce a debug branch that spits out some additional debug information at some point if I'm not able to reproduce the issue.

lineage-of-roots commented 5 months ago

Tried that initially and it doesn't work for some window types - own_window_type=desktop can't capture clicks, own_window=false can't capture movement, window enter/leave events don't get reported properly if the cursor moves from a >window to conky (only desktop<->conky worked).

own_window=false is only for tiling window managers, right? AFAIK, you have to jump through hoops to make tiling WMs do anything with right clicks on desktop. In which case, they might as well avoid clicking on conky. Also, there wouldn't be much of a point to motion on Conky in tiling WMs as well?

If own window is true on tiling WMs, then conky gets tiled like any other window. Only when own window is false, conky gets drawn on the desktop in tiling WMs. (EDIT: own_window can be true on tiling wms, but own_window_type will have to be desktop)

And as far as floating WMs go, own_window=yes, and own_window_type=normal should be the default/recommended setting. If own_window is false in floating WMs, conky doesn't show up. It runs, but won't show up. Not sure what happens to it there.

I have not investigated any differences in DEs when it comes to own_window_type=desktop/normal yet. I am guessing own_window_type=normal should be the default there as well.

This is an X11 bug/thing. Wayland worked flawlessly across all compositors I tested, with only 20 lines of dead simple code....

You mentioned dwl, so I thought maybe the changes had something to do with Wayland or Xwayland.

Right... that's a separate bug then.. I have to examine the window tree for both JWM and fluxbox.

I didn't mean to present the JWM interaction as a bug. It was just something quirky I noticed. JWM works normally for the most part. Better than others, infact. The menu shows up on right click as well. It's just the hover/motion events that behave differently.



Other than that...I have been investigating the events with the xev tool. From what I have seen, EnterNotify and LeaveNotify events are all there. Whether you hover on conky from the desktop or from another window, or from another window then desktop between then conky. Tested on conky 1.19.8 EDIT The EnterNotify and LeaveNotify events are there on both own_window_type=normal and own_window_type=desktop I will update more about this further, if I discover anything else.
Caellian commented 5 months ago

Started working on this, will link PR once I have some working code so the cache idea can be tested.

lineage-of-roots commented 5 months ago

So here's something Major

In my testing, the compositor is making a difference in floating window managers. If picom is running, own_window=false can't be used. Conky runs, but will not show up.

If picom is not running, own_window=false will have conky drawing as part of the root window. Doing an xwininfo on it then will have it show up as root window. At which point, clicks on conky are the same as clicks on desktop/root window.

This behavior is the same across FLuxbox, Openbox, JWM, IceWM, Blackbox

And as previously stated, when own_window=true is used then normal/desktop both show EnterNotify and LeaveNotify consistently. (This is independant of picom)

EDIT: Tested on version 1.19.8

Caellian commented 5 months ago

@jborme & @lineage-of-roots, can you test fix/bad-client-list-check branch to see whether it resolves the issue?

lineage-of-roots commented 5 months ago

@Caellian

I can do the tests, but about 12 hours later.

In the meantime, I found this ( I have not gone through the relevent code in conky yet, but this seems relevant to what you mentioned)

window enter/leave events don't get reported properly if the cursor moves from a >window to conky (only desktop<->conky worked).

From the Docmentation:

The X server reports MotionNotify events to clients wanting information about when
the pointer logically moves.
The X server generates this event whenever the pointer is moved and the pointer 
motion begins and ends in the window. 
The granularity of MotionNotify events is not guaranteed,
but a client that selects this event type is guaranteed to receive
at least one event when the pointer moves and then rests. 

Relying on MontioNotify has this drawback/feature. The pointer has to start and end in the desired window.

If the code is relying on MotionNotify to know when conky has been hovered over, then this seems most probably where the inconsistency is coming from.

EnterNotify and LeaveNotify are consistent.

Own_window=false can't generate the event for conky, because conky becomes the same as the root window. There's only events for the root window at that point.

Caellian commented 5 months ago

@lineage-of-roots Ok, but we're not using regular MotionNotify events and instead XInput global ones, so that's unrelated. Enter/LeaveNotify aren't consistent because they don't get reported in a lot of cases (see #1495).

Let me know whether fix/bad-client-list-check resolves the issue when you can, I'm pretty sure it should be the appropriate fix.

Caellian commented 5 months ago

Merged into main as it was an obvious defect. So try building from main and let me know whether I can close this issue.

jborme commented 5 months ago

I built conky after cloning the repository into a temp folder, I think that gives me the most current version. It does not solve the issues, though the exact symptoms are slightly different. fluxbox does not lose "focus" (previously rxvt terminal lost the "cursor" when the bug appeared ), cpu load only goes to a total of 1.5 instead of more as I remember, and the menu takes 7 seconds to appear.

Caellian commented 5 months ago

Alright, I'll do some more testing and try to generate a flamegraph to best see what's the issue as I don't see much performance issues on my side. Just gotta re-learn how to do that 😆

lineage-of-roots commented 5 months ago

@Caellian

I will get to testing shortly...In the meantime I wanted to clear/understand a few things.

@lineage-of-roots Ok, but we're not using regular MotionNotify events and instead XInput global ones, so that's unrelated. Enter/LeaveNotify aren't consistent because they don't get reported in a lot of cases (see #1495).

I meant previously, when there was no XISelectEvents()

In #1495, the first thing I noticed was that propagate was false in XsendEvent, in all the code. So that was one source of the problem back then.

Back then we had (We still do have in "fallback" in the current code)

if (own_window_type.get(l) != TYPE_DESKTOP) { input_mask |= ButtonPressMask | ButtonReleaseMask | PointerMotionMask | EnterWindowMask | LeaveWindowMask; }

PointerMotionMask is infact, MotionNotify. So it can be inconsistent depending on how it's used. (I have not looked through the whole code though)

Interestingly I did not see anything in the code(Current or Previous), referencing ButtonMotionMask

From the Documentation

ButtonMotionMask

The client application receives MotionNotify events only when at least one button is pressed. 

Using it could be useful.



Now, just so I understand...We want Conky itself to be interactive with clicks and motion, right? We don't just want to forward clicks/motion to anything below conky? Am I correct? As far as I understand, when own_window=false is used...Interactivity can be difficult or impossible. Because then Conky will be just like the wallpaper. Are we trying to achieve interactivity even in this case? (This also conflicts with at least picom...Doesn't allow own_window=false, could be with other compositors as well) Lastly, Can I also get your conkyrc config with which you would be testing this interactivity, please. I don't have any configs with buttons and stuff.
lineage-of-roots commented 5 months ago

Did some quick testing.

The performance hit is still there. Across all window managers floating/tilling. The peaks vary.

Most noticeable in Openbox and Fluxbox.

Now even in Openbox the menu can be delayed slightly, depending on how much mouse movement there was. The menu on Openbox can also not appear sometimes, depending on where the pointer moved from and how much movement there was prior.

Move the mouse around for a few seconds to see the jump. Temperature climbs and my fans rev up lol.

The performance hit predictably gets even worse if there's more than one conky instance running. Such as when config files are separated.

I will later test on my Laptop as well to compare.

lineage-of-roots commented 5 months ago

So here are some sources of the performance hit I came across

This loop in x11.cc

    for (int i = 0; i < numChildren; i++) {          
      Window *newRoot = None;

It will be executed on three separate occasions per pixel move.

The three sources of its execution are here in display-x11.cc

  Window event_window =
      query_x11_window_at_pos(display, data->root_x, data->root_y);   

  bool same_window = query_x11_top_parent(display, event_window) ==
                     query_x11_top_parent(display, window.window);

These queryx11*** functions all lead to that loop.

Now, numChildren can vary at any given point. There can be quite a few in there depending on individual setups and what other applications are open/running (Many applications will add several children, and can add more depending on layouts).

numChildren in that function is the same number you get if you do the command xwininfo -children -root

Suppose we had 100 children of root, then that loop will run for a total of 100x3=300 times per pixel move in the current code.

Another place is in mouse-events.cc

  std::bitset<32> buttons;  
  for (size_t bi = 1; bi < source->buttons.mask_len * 8; bi++) {
    if (XIMaskIsSet(source->buttons.mask, bi)) buttons[bi] = true;
  }

  std::map<size_t, double> valuators{};
  double *values = source->valuators.values;
  for (size_t vi = 0; vi < source->valuators.mask_len * 8; vi++) {
    if (XIMaskIsSet(source->valuators.mask, vi)) valuators[vi] = *values++;
  }

These loops will execute 256+64=320 times per pixel move

There are many other executions per pixel move, but these are some of the major ones.

EDIT:: On window managers without virtual roots(Openbox, Fluxbox, IceWM to name a few), the above loop with numChildren never breaks. The mask_len is by own_window=true and own_window_type=normal



Suggestions We could perhaps do XQueryPointer less often (every 0.1 seconds perhaps?) instead of every pixel move. Could further improve by working out and remembering the size & position of the conky window at launch, and executing more stuff only if the pointer is within that rectangle.(in cases where the usual xevent stuff is not sufficient) I bypassed much of the execution just to get ge the x, y position of the pointer, and it seems that XQueryPointer just by itself is also quite cpu intensive. However, it did considerably lower the cpu usage of conky and xorg.
Caellian commented 4 months ago

image

image

Seems we're spending most of the time waiting for VRootWindowOfScreen to find __SWM_VROOT (that is to query all window properties).

Caellian commented 4 months ago

@jborme let me know if this issue needs to be reopened. The PR that fixes performance issues has been merged into main so you can hopefully use it now.

lineage-of-roots commented 4 months ago

@Caellian

I found some of the answers I was looking for here https://github.com/brndnmtthws/conky/wiki/Mouse-Events

However, while trying to test out button functionality, I came across an issue with cairo and pagocairo. (pangocairo is missing from the options. 1.19.8 is linked to both and works with cairo)

conky: llua_load: /home/me/.conky/experiment/experiment.lua:1: module 'cairo' not found:
    no field package.preload['cairo']
    no file '/usr/local/share/lua/5.4/cairo.lua'
    no file '/usr/local/share/lua/5.4/cairo/init.lua'
    no file '/usr/share/lua/5.4/cairo.lua'
    no file '/usr/share/lua/5.4/cairo/init.lua'
    no file '/usr/local/lib/lua/5.4/cairo.lua'
    no file '/usr/local/lib/lua/5.4/cairo/init.lua'
    no file '/usr/lib/lua/5.4/cairo.lua'
    no file '/usr/lib/lua/5.4/cairo/init.lua'
    no file './cairo.lua'
    no file './cairo/init.lua'
    no file '/usr/local/lib/conky/libcairo.so'
    no file '/usr/local/lib/lua/5.4/cairo.so'
    no file '/usr/lib/lua/5.4/cairo.so'
    no file '/usr/local/lib/lua/5.4/loadall.so'
    no file '/usr/lib/lua/5.4/loadall.so'
    no file './cairo.so'

It wasn't getting linked, even though it was enabled in the cmake build options. I linked them manually, but Conky is showing me the above output.
It's looking for them everywhere except /usr/lib/libcairo.so and /usr/lib/libpangocairo-1.0.so (where version 1.198 is linked and working)

Caellian commented 4 months ago

pangocairo is missing from the options. 1.19.8 is linked to both and works with cairo

Pangocairo is a Wayland dependency and has no Lua bindings. Cairo requires you to enable BUILD_LUA_CAIRO option.

lineage-of-roots commented 4 months ago

I did that.

Turned it on via cmake -DBUILD_LUA_CAIRO=true also tried from GUI Cmake.

Same result.

Caellian commented 4 months ago

I did that. Turned it on via cmake -DBUILD_LUA_CAIRO=true also tried from GUI Cmake. Same result.

Ok, let's not comment here as it's not related to this issue. Open a separate issue or comment on the other one (#1867), so that jborme doesn't get notifications about unrelated things.

lineage-of-roots commented 4 months ago

Scratch that.

I fixed it.

It wants libcairo.so in /usr/local/lib/conky now.
Previously it was /usr/lib/

I just noticed it's closed. Had the link bookmarked in the browser.

jborme commented 4 months ago

The delay is fixed for me with the latest version