brndnmtthws / conky

Light-weight system monitor for X, Wayland (sort of), and other things, too
https://conky.cc
GNU General Public License v3.0
6.9k stars 603 forks source link

[Bug]: (Very) slow `same_window` check in mouse input event #1886

Open Caellian opened 2 weeks ago

Caellian commented 2 weeks ago

What happened?

Unfortunately, another problem arose! Scripts stops updating on screen after a few seconds. .xsession-errors logs the following:


conky: FOUND: console
conky: FOUND: ncurses
conky: FOUND: file
conky: FOUND: x11

conky: 'openbox' x11 session running 'openbox' destop


>Then the scripts stops updating the screen.
>
>Not only does the scripts that needed " require 'cairo_xlib' " stop functioning, but it seems every conky process stops responding. Can't kill running processes either, they simply respawn before they all shut down.
>
> Don't know if this helps, but the moment you touch your mouse, conky stops responding.

_Originally posted by @belrus65 in https://github.com/brndnmtthws/conky/issues/1867#issuecomment-2088499835_

### Version

1.20.2

### OS

Gentoo

### Config

<details><pre>
conky.config = {
  background = true,
  update_interval = 1,
  double_buffer = true,
  no_buffers = true,
  xinerama_head = 1,
  alignment = "top_left",
  gap_x = 96,
  gap_y = 832,
  maximum_width = 190,
  minimum_width = 190,
  minimum_height = 190,
  own_window = true,
  own_window_class = "Conky",
  own_window_title = "Core1-Rings",
  own_window_type = "desktop",
  own_window_transparent = true,
  own_window_argb_visual = true,
  own_window_argb_value = 255,
  draw_shades = false,
  draw_outline = false,
  draw_borders = false,
  draw_graph_borders = false,
  border_inner_margin = 0,
  border_outer_margin = 1,
  border_width = 1,
  use_xft = true,
  xftalpha = 1,
  override_utf8_locale = true,
  default_color = "FFFFFF",
  color0 = "FFFFFF", -- white
  color1 = "00FFFF", -- cyan1
  color2 = "007F7F", -- cyan2
  lua_load = "/data/configs/conky/dreamlan/scripts/draw-shapes.lua",
  lua_draw_hook_pre = "conky_main cyan cpu1"
};

conky.text = [[
  ${execpi 1 /data/configs/conky/dreamlan/scripts/get-core-info.sh 1 13}
]]
</pre>
</details>
Caellian commented 2 weeks ago

@belrus65 Can you send me your latest lua script as a comment attachment (might need to add .txt extension)?

I was debugging mouse events a lot in Openbox past few weeks and they seemed fine, so it might be a weird interaction with cairo.

I also have a few questions:

belrus65 commented 2 weeks ago

lua-scripts.zip Screenshot-from-belrus65 xwininfo-results-conky-1.19.8.txt xwininfo-results-conky-1.20.2.txt

Here are the files. I have a pure openbox gentoo system with 3 monitors running several conky panels on each monitor (I've included a screenshot of my desktop to make it easier to explain). The problem starts as soon as I load up to 4 lua scripts (clock & memory bar on center monitor, and cpu usage dials on the other monitors). If I move the mouse or interact with the desktop (bring up menu with key bindings) the conky processes seem to stall for some time (the more I move the mouse to longer the stall exists). None of these problems existed in the previous (1.19.8)

lineage-of-roots commented 2 weeks ago

@belrus65

Can I see your conkyrc files as well please? (For invesitgation, and I like your layout :) )

You are running multiple instances of conky, right? Like through seperate conkyrc files?

belrus65 commented 2 weeks ago

Do you need all my conkyrc, or only the ones running the lua scripts? Yes, there are exactly 28 different panels all executed from a bash script.

lineage-of-roots commented 2 weeks ago

All of them :) and the script executing them

Caellian commented 2 weeks ago

@belrus65 Right, as a workaround, you can try building conky with BUILD_XINPUT and/or BUILD_MOUSE_EVENTS disabled. One of those is likely causing issues.

belrus65 commented 2 weeks ago

ok @lineage-of-roots, i'll package it altogether as soon as I get back home.

lineage-of-roots commented 2 weeks ago

@belrus65 Thanks in advance

Caellian commented 2 weeks ago

Related to #1852.

belrus65 commented 2 weeks ago

Here is my entire conky @lineage-of-roots.

@Caellian, I'm in the middle of a deployment for a client, I will try to build conky with BUILD_XINPUT and/or BUILD_MOUSE_EVENTS disabled over the weekend.

Caellian commented 2 weeks ago

@Caellian, I'm in the middle of a deployment for a client, I will try to build conky with BUILD_XINPUT and/or BUILD_MOUSE_EVENTS disabled over the weekend.

That's fine, try it out when you have time. Thanks for sharing the files, it will be helpful for debugging.

lineage-of-roots commented 2 weeks ago

@belrus65

Thanks for the files. Much appreciated.

I ran them, they don't stall for me though.

I would, however, also say just to build conky without Xinput.


For a deeper dive/explanation... I would say this also has something to do with your Kernel config/build (Same for Xorg and openbox build/config) and probably also any "Nice" values you might have set. I am guessing your Gentoo Kernel is personally customized. Conky with Xinput enabled does serveral kinds of queries to Xorg (For every pixel your mouse moves. Some of it is also related to mouse polling rates) Since you have many separate instances of conky running at the same time, they are hitting Xorg at the same time. This is where your builds and configs come in for the Kernel, Xorg, Openbox etc etc. I just see a large jump in cpu usage, but no stalls or hang ups. Quickest solution is to build conky without Xinput.
belrus65 commented 2 weeks ago

Thanks @lineage-of-roots, I will disable all unneeded flags this weekend and rebuild.

belrus65 commented 2 weeks ago

Strange though, did that much of the code change from previous version? Never had any issues with lag before, and nothing changed on my system in ways of xorg or openbox (newer kernel, but it's the gentoo binary). In any case, it still will be good to streamline the conky package to my minimal requirements. Thanks again for the feedback.

lineage-of-roots commented 2 weeks ago

@belrus65 You're welcome

Yes, the code has changed.

And it can tend to behave differently across different Kernels/Distros.

Previously when I was testing Conky, on my Arch desktop, I could get a delay in the Fluxbox menu appearing after I right clicked on the desktop. The Fluxbox menu did not delay on my Laptop, which is running Debian (MX-Linux Wildflower) Even though the cpu usage would jump high on both.

So the only conclusion was that different builds/configs are dealing with Xinput differently.

(This is ofcourse if Conky is built with Xinput support enabled)

Caellian commented 2 weeks ago

Just want to point out a few things though:

It would be nice if you could run conky (built with -DCMAKE_BUILD_TYPE=RelWithDebInfo) with valgrind when you have time (valgrind --tool=callgrind --dump-instr=yes --simulate-cache=yes --collect-jumps=yes conky) and send the callgrind.out file as that would tell us what part of code is causing this slowdown.

Assuming it's up to kernel setup I'm marking this as wontfix, but send the callgrind file and I'll see whether I can improve performance on your system (or possibly for others) somehow so you don't have to rebuild your kernel or disable default functionality in order to use newer versions :face_with_spiral_eyes: .

belrus65 commented 2 weeks ago

@lineage-of-roots, I created a custom local ebuild repository for conky, and set -DBUILD_XINPUT=no & -DBUILD_MOUSE_EVENTS=no in the ebuild file, and re-emerged the 1.20.2 package on my system. All is working fine now, no more lag.

conky --version conky 1.20.2 compiled for Linux x86_64

Compiled in features:

System config file: /etc/conky/conky.conf Package library path: /usr/lib64/conky

General:

belrus65 commented 2 weeks ago

@Caellian, I am not sure how to modify the ebuild file to include your suggestion:

"It would be nice if you could run conky (built with -DCMAKE_BUILD_TYPE=RelWithDebInfo) with valgrind when you have time (valgrind --tool=callgrind --dump-instr=yes --simulate-cache=yes --collect-jumps=yes conky) and send the callgrind.out file as that would tell us what part of code is causing this slowdown."

Caellian commented 2 weeks ago

@Caellian, I am not sure how to modify the ebuild file to include your suggestion.

Build without ebuild/portage, it's 2/3 commands but here's a step-by-step guide in the terminal:

lineage-of-roots commented 2 weeks ago

@belrus65

Glad it worked out.

I shall cherish your conky configs.

Thanks again :)

lineage-of-roots commented 1 week ago

@belrus65

I just noticed you left your api key for the weather in your files.

Just wanted to inform you of that.

I won't be using it. But your files are visible here.

belrus65 commented 1 week ago

@lineage-of-roots, thanks for the heads up! Was in the process of working on a deployment for a client & hurried to upload files for you and didn't pay much attention. I deleted the package for now...will replace them later tonight after I had time to review them.

Caellian commented 1 week ago

@belrus65 No need to review and replace them. Given that lineage-of-roots wasn't able to recreate the bug with your scripts, they're not that useful for debugging the issue.

belrus65 commented 1 week ago

callgrind.out.txt

@Caellian, here is the valgrind result

Caellian commented 1 week ago

Sorry to bother, you did everything right, but you stopped conky a bit too soon. It didn't get to the main event loop yet. Leave it on a minute or two longer. Thank you for doing this btw.

belrus65 commented 1 week ago

@Caellian I reran valgrind but this time a selected a conkyrc that calls a lua script and I keep getting (llua_do_call: function execution failed: attempt to call nil value) line after line printed in the terminal. I cannot Ctrl-C, stuck in endless loop, need to reboot to stop process.

Caellian commented 1 week ago

You can run pkill -KILL conky, no need to reboot. If your system is frozen, you can Ctrl+Alt+F4 into another tty, login, and then kill conky from there.

lineage-of-roots commented 1 week ago

@belrus65 No problem.

There's still a little issue with the libcairo.so and libcairo_xlib.so with the built executable of conky. Even if CMAKE_SKIP_INSTALL_RPATH is checked to true, the cairo stuff still wouldn't work with the built executable.

Current solution I have been doing is to manually copy libcairo.so and libcairo_xlib.so files from the build/lua directory, into /usr/local/lib/conky directory. That's when the cairo stuff starts working with built from source executable which is not installed (Don't want to go through all the cmake stuff currently)


Other than that, I doubt running only one instance of conky through valgrind will make the problem re-appear Your usual conky setup is, for the sake of humor, doing a DDOS attack on your Xserver lol Because of all the XInput stuff If you really want to get to the bottom of it, then you would probably have to do the following ```` Remember not to touch your mouse until its time. 1. Launch your usual conky instances through your start-conky-dreamlan.sh script 2. Once they are loaded, execute one more conky through valgrind 3. Wait till valgrind gets everything going 4. Move your mouse a bit 5. Wait a few seconds 6. Kill all the conkies. Stop Valgrind ````
Closest relevant info I found is on the LibX11 github page, that says ```` Release 1.7.3 Fixes a hanging issue in _XReply() where the replying thread would wait for an event when another thread was already waiting for one. ```` Perhaps this is not as fixed as reported. Or it all might come down to some "Scheduling" choices by the Kernel in different builds/configs/distros in how they deal with the very high number of XInput requests The problem might also present itself in something like this chart ![kcachegrind_chart](https://github.com/brndnmtthws/conky/assets/165924889/23d880ec-8ecc-4fe1-91e2-9e637e2f13ac)
Caellian commented 1 week ago

Your usual conky setup is, for the sake of humor, doing a DDOS attack on your Xserver lol Because of all the XInput stuff

Propagation is still happening without the flags, in the same way. Conky doesn't propagate events that it doesn't catch. His desktop doesn't freeze, conky does, so I think this isn't a good analogy.

how they deal with the very high number of XInput requests

There's no XInput requests. We tell XInput we want to listen for cursor movement, XInput constructs event once and sends a copy to each client (and conky) that's listening for it when they poll events. Conky doesn't propagate events that aren't over its window, so this shouldn't affect performance when cursor isn't over conky.

_XReply is definitely going to wait in the affected code because there's no other code that's interacting with X11 after window is constructed (besides resizing if content size changes).

belrus65 commented 1 week ago

callgrind.out.txt

Here is the valgrind output. I had to manually copy the libcairo.so & libcairo_xlib.so to /usr/local/lib/conky folder (which I needed to create manually also). Hope this helps you, as for myself, the lag is gone since I disabled XINPUT & MOUSE_EVENTS in my custom build script.

lineage-of-roots commented 1 week ago

Propagation is still happening without the flags, in the same way. Conky doesn't propagate events that it doesn't catch. His >desktop doesn't freeze, conky does, so I think this isn't a good analogy.

I did mean that in the humorous sense. And not related to any event propagation. I knew his conky was freezing and not his desktop.

You can see Xorg jump up in cpu usage when you move the mouse around anywhere, on any window. With many conky instances running, Xorg gets working overtime. His script launches over 50 separate conky instances, If I counted correctly.

To simulate the effect, I launched many xeyes. A similar jump in Xorg's cpu usage can be observed then.

how they deal with the very high number of XInput requests

I shouldn't have said "requests". That sounds like an actual request

I meant all the Xinput related code running.

My hypothesis was/is that something is going wrong with conky (Similar to fluxbox menu delay) because of all the separate but simultaneous Xquerypointers running. It's possible that Xquerypointer is not the problem, and it just results in high cpu usage as usual.

From my previous testing on Debian, I knew the problem doesn't necessarily manifest across different Distros, Kernels etc etc.

Caellian commented 1 week ago

It still didn't reach main.

But the issue is likely specific to your system, or maybe it's some combination of circumstances I can't seem to recreate because I'm not experiencing any issues even with 100 running instances. The code in question does take about 50% of main_loop for a simple configuration (handle_event<1>), and quarter of that is waiting for replies from Xorg:

image

The only function that asks for optimization is query_x11_top_parent , which is likely the cause of slowdown on your system (namely, use of XQueryTree), but I don't see a way around it that would work everywhere - query_x11_top_parent was required to make event window matching work on Openbox.

Anyway, here's a breakdown of relative costs of different parts of mouse event handling if someone will need it in the future: image

I might look at this a bit more at some point, but I don't see any obvious ways to improve it.