emersion / xdg-desktop-portal-wlr

xdg-desktop-portal backend for wlroots
MIT License
579 stars 53 forks source link

Bug in sd-bus polling code causes hang #281

Closed cillian64 closed 7 months ago

cillian64 commented 10 months ago

I've been hitting a bug where, on session start, xdg-desktop-portal (xdp) occasionally hangs in xdp_dbus_impl_screenshot_proxy_new_sync. This blocking call, amongst other things, starts xdg-desktop-portal-wlr (xdpw) and calls GetAll on org.freedesktop.DBus.Properties on xdpw. Based on dbus-monitor logs it looks like xdpw doesn't respond to the GetAll call for several minutes. Eventually xdpw receives another dbus message and then responds to both immediately.

I believe this is because of a bug in how xdpw uses poll() with sd-bus. The docs for sd_bus_get_fd require that all three of sd_bus_get_fd, sd_bus_get_events, and sd_bus_get_timeout are called for every invocation of poll. xdpw only calls sd_bus_get_fd once before its main loop and never calls sd_bus_get_events or sd_bus_get_timeout.

What's happening with my hang is that sd-bus has a message in its receive queue before xdpw polls. sd_bus_get_events and sd_bus_get_timeout would both return 0 to indicate that poll should return immediately so we can handle the message. Instead, xdpw goes into poll with no timeout. Because the received message has already been read from the FD to the sd-bus receive queue, poll will not see it and we don't handle the message until another arrives.

I've got a provisional fix for this but I'd appreciate some comments on how it could be refined. I'll submit a PR shortly.

System details:

name-snrl commented 8 months ago

Hey @cillian64 what is look like?

I have a problem with xdpw randomly delaying xdp startup, it's similar if you don't set variables. 25 seconds delay, and after timeout everything works fine. As I said, it happens randomly and is a huge problem if I'm trying to figure out what the problem is

cillian64 commented 8 months ago

That sounds quite similar to this. The specific symptom we saw was gtkmm applications hanging while xdp is deadlocked (which affected wf-panel-pi, the panel on Raspberry Pi OS).

A way to find out for sure would be to build an xdpw with my fix in #282 and see if the problem goes away.

name-snrl commented 8 months ago

thanks for the quick response.

A way to find out for sure would be to build an xdpw with my fix in #282 and see if the problem goes away.

yeah, I plan to do it this weekend.

name-snrl commented 8 months ago

@cillian64 YES, it works. Thank you so match for the #282. It completely solved my problem. No more delay at startup. Рope it'll be merged to the next release.

cillian64 commented 8 months ago

Good to hear, thanks for the update!