Cloudef / wlc

High-level Wayland compositor library
MIT License
330 stars 58 forks source link

gtk3 applications crash when trying to open a GtkMenu #210

Closed jplatte closed 7 years ago

jplatte commented 7 years ago

So after first submitting this issue on sway, I have tried and succeeded in reproducing it with orbment: When I open a menu in a gtk3 application (e.g. right click in a text view, pressing the hamburger button in Epiphany), the application terminates. This only happens with GtkMenus, not with the more recently added GtkPopovers. When using the gtk3 package from the official arch repositories, these crashes look like this:

(gedit:10785): Gdk-ERROR **: Error flushing display: Broken pipe
[1]    10785 trace trap (core dumped)  LC_ALL=C gedit

Weirdly, when using gtk3 with the typeahead patch (gtk3-typeahead in AUR) I don't get a core dump, and the error message looks different:

# LANG=en_US.utf8
Gdk-WARNING **: Lost connection to Wayland compositor.

# LANG=de_DE.utf8
Gdk-WARNING **: Error 71 (Protokollfehler) dispatching to Wayland display.

I'm about to rebuild the official gtk3 package with debug symbols to hopefully obtain a stack trace.

jplatte commented 7 years ago

So I've now tried the same thing with a self-built version of gtk3 without patches, with debugging symbols enabled, and have managed to get a stack trace from one of the Broken pipe error cases (I got the other one as well a few times). This is a stack trace I got from gedit:

#0  0x00007ffff4767ff1 in  () at /usr/lib/libglib-2.0.so.0
#1  0x00007ffff476a731 in g_log_writer_default () at /usr/lib/libglib-2.0.so.0
#2  0x00007ffff4768b8c in g_log_structured_array () at /usr/lib/libglib-2.0.so.0
#3  0x00007ffff4768e89 in g_log_structured () at /usr/lib/libglib-2.0.so.0
#4  0x00007ffff58cd7a7 in gdk_event_source_prepare (base=0x67db50, timeout=<optimized out>) at gdkeventsource.c:66
#5  0x00007ffff4761c89 in g_main_context_prepare () at /usr/lib/libglib-2.0.so.0
#6  0x00007ffff47626ab in  () at /usr/lib/libglib-2.0.so.0
#7  0x00007ffff476289c in g_main_context_iteration () at /usr/lib/libglib-2.0.so.0
#8  0x00007ffff780654d in g_application_run () at /usr/lib/libgio-2.0.so.0
#9  0x0000000000400c2a in main ()
jplatte commented 7 years ago

Here's the gdk_event_soure_prepare function from my local version of the source, with the line of the error message marked:

  static gboolean
  gdk_event_source_prepare (GSource *base,
                            gint    *timeout)
  {
    GdkWaylandEventSource *source = (GdkWaylandEventSource *) base;
    GdkWaylandDisplay *display = (GdkWaylandDisplay *) source->display;

    *timeout = -1;

    if (source->display->event_pause_count > 0)
      return _gdk_event_queue_find_first (source->display) != NULL;

    /* We have to add/remove the GPollFD if we want to update our
     * poll event mask dynamically.  Instead, let's just flush all
     * write on idle instead, which is what this amounts to.
     */

    if (_gdk_event_queue_find_first (source->display) != NULL)
      return TRUE;

    /* wl_display_prepare_read() needs to be balanced with either
     * wl_display_read_events() or wl_display_cancel_read()
     * (in gdk_event_source_check() */
    if (source->reading)
      return FALSE;

    /* if prepare_read() returns non-zero, there are events to be dispatched */
    if (wl_display_prepare_read (display->wl_display) != 0)
      return TRUE;
    source->reading = TRUE;

    if (wl_display_flush (display->wl_display) < 0)
*     g_error ("Error flushing display: %s", g_strerror (errno));

    return FALSE;
  }
jplatte commented 7 years ago

@Cloudef I really don't know much about window managers and such, but I guess I could try debugging this.. Any pointers on where to start? (I made sure this doesn't happen with gnome-shell, so it seems pretty clear that the next step is actually debugging a window manager build with wlc)

Cloudef commented 7 years ago

The problem is most likely that xdg is not fully implemented in wlc. For example it's lacking the positioner interfaces. wlc should log about unimplemented functions.

jplatte commented 7 years ago

Ah yeah. Ran orbment with logging, and got

[17:16:09.192] (WARN) wlc: xdg_cb_create_positioner @ line 140 is not implemented

in the log file. I'll look through the code and see if I can't implement this. Gonna be quite a bit of spec-reading I imagine..

Drakulix commented 7 years ago

@Cloudef Because this issue is really annoying me in my daily use, I would like to try to tackle it.

I browsed through the source code and got some questions:

Cloudef commented 7 years ago

@Drakulix Will get on this bit later today. I need to remind myself on the positioner details before I can answer well.

Cloudef commented 7 years ago

@Drakulix Sorry for late reply:

How would you like to see the positioner getting exposed by the api? It seems like it should almost be a new handle type. Alternatively don't expose it and just respect it's settings in the set_geometry call or stub it so it does not crash anymore.

If we can keep it out of api that would be great.

Would the positioner be a new resource in src/resources/types? How would I allocate it in the create_positioner call?

Yes, you'll put the implementation of positioner methods there.

Looking at the rest of the code, I just use wlc_resource_create and wlc_resource_implement, right? Implementing it then seems pretty straight-forward.

xdg-shell.c will be the parent that calls the wlc_resource_create and holds the positioner object. You need to add new wlc_source for positioners in xdg_shell's struct, and initialize it in wlc_xdg_shell function in xdg-shell.c, the positioner resources will be then tracked and constructor / destructors (if any) called automatically when resources are created / destroyed from that source.

kozec commented 7 years ago

Hi there.

Funny thing about this positioner thing. I implemented it, just with implementation that creates positioner that stores nothing so far, and I found that GTK apps are crashing anyway:

zxdg_popup_v6@29: error 1: xdg_cb_popup_grab @ line 27 is not implemented

Is there any xdg_cb_popup_grab implementation around?

Cloudef commented 7 years ago

Nope, no popups grabs, more things that needs to be implemented. Anything that creates a wayland resource can't be simply stubbed (they at least need dummy resource) or the client will indeed abort.

kozec commented 7 years ago

Actually, just commenting out stub in xdg_cb_popup_grab "fixes" weston-terminal. Gnome-terminal menu is automatically dismissed, but at least it doesn't crash.

Looks like I need to find what's popup grab supposed to do.

//edit:

Anything that creates a wayland resource can't be simply stubbed (they at least need dummy resource)

Implementation above creates dummy resource.

Cloudef commented 7 years ago

Actually, just commenting out stub in xdg_cb_popup_grab "fixes" weston-terminal. Gnome-terminal menu is automatically dismissed, but at least it doesn't crash.

Yeah "stub" is something that throws wayland protocol error. There is another macro "stubl" which just logs warning but doesn't throw protocol error.

kozec commented 7 years ago

Funny thing is, that entire xdg_cb_popupgrab thing looks like reimplementation of X11 bug where application can basically disable user's keyboard, including A-C-Backspace and A-C-Del --

It also doesn't looks like compositor can ignore that request. Only other option is to dismiss popup automatically.

What are you doing in similar cases usually? :D

Drakulix commented 7 years ago

It also doesn't looks like compositor can ignore that request. Only other option is to dismiss popup automatically.

What happens if you do neither in the implementation?

kozec commented 7 years ago

What happens if you do neither in the implementation?

GtkMenu and GtkPopup flashes on screen for a moment and then disappears forever. I can control them blidnly using keyboard, but they are not drawn anywhere.

Of course, there is no error from GTK anywhere...

Cloudef commented 7 years ago

It also doesn't looks like compositor can ignore that request. Only other option is to dismiss popup automatically.

Think that is just weston specific behaviour. It won't allow multible grabs. (Probably consistend with X11 perhaps? Remember the bug where screen lock / screensaver won't trigger if popup is open)

GtkMenu and GtkPopup flashes on screen for a moment and then disappears forever. I can control them blidnly using keyboard, but they are not drawn anywhere.

May also be bug in wlc somewhere.

kozec commented 7 years ago

Think that is just weston specific behaviour. It won't allow multible grabs. (Probably consistend with X11 perhaps? Remember the bug where screen lock / screensaver won't trigger if popup is open)

That behavior is actually defined in xdg-shell-unstable-v6.xml. "If the compositor denies the grab, the popup will be immediately dismissed."

This may be pretty big problem, because while allowing multiple grabs may not be necessary, GTK-based stuff clearly depends on possibility to have at least one.

// edit: I caputred some log from gnome-text-editor exposing same behavior.

http://pastebin.com/BgweN4rc

Maybe problem is really caused only by not having positioner (and setting size) implemented?

// edit: Yep,

LogType.WARN xdg_cb_popup_grab @ line 27 is not implemented
zxdg_popup_v6_send_configure 0,0 0x0
zxdg_popup_v6_send_configure 0,0 183x243
on_view_request_geometry 4 -> (0, 0) :: (171L, 231L)
zxdg_popup_v6_send_configure 100,100 171x231
zxdg_popup_v6_send_configure 100,100 171x231
on_view_request_geometry 4 -> (100, 100) :: (0L, 0L)
zxdg_popup_v6_send_configure 100,100 0x0
zxdg_popup_v6_send_configure 100,100 0x0

Now only if I knew where that 0x0 size comes from...

Cloudef commented 7 years ago

That behavior is actually defined in xdg-shell-unstable-v6.xml. "If the compositor denies the grab, the popup will be immediately dismissed."

It's still up to compositor to decide what denying a grab means though?

kozec commented 7 years ago

It's still up to compositor to decide what denying a grab means though?

May be, but it will block user from using menus in any case :D

kozec commented 7 years ago

Ok, I got menus to behave simply by restricting minimal view size, so I believe that respecting data in positioner should fix the problem.

But I'm wondering about keeping it from API discussed above. Shouldn't positioner data be something available to request_geometry_cb? Otherwise it will be next to impossible to keep menus and popups displayed on place where user expects them.

Drakulix commented 7 years ago

My original idea was to make all geometry of the public API relative to the anchor rect (respecting gravity and all that stuff), if one exists (means if a positioner exists).

Additionally add a new public function wlc_view_get_anchor_rect which exposes the anchor rect (if it exists), so the compositor might calculate the absolute position, if necessary, but denying to position the view outside of it, by restricting all geometry to be relative.

That would keep most of the current api with minimal changes, but on the other hand forces correct behavior on any implementation. I am not sure, if that is desired.

Otherwise this would most likely require a lot of new functions to retrieve all the information of the positioner or even a new handle type for the positioner.

Cloudef commented 7 years ago

@Drakulix Doesn't sound like bad idea to me.

In any case you can freely experiment what works best. I haven't been doing wm / wayland development for while now, so I don't have much weight on how it all glues together, but will point out the obvious if I see it. As far I see there are currently already some difficulties with the many different kinds of popups / tooltips in wayland / xwayland land which either show correctly or not, and even now WM needs to check view types and all that shit to make it behave "correctly". So I'm all for if you manage to find better solutions for what we currently have. One thing I can suggest though, is not to let wlc control any position automatically or so, that was something I did earlier and there still may be some leftovers and it was just can of worms. Its better just to give hints (if view / surface has any) to compositor and let it decide where stuff will eventually end up to. Of course wayland protocols has slowly started to stack more and more complex parenting relationships, transformations and region book keeping which make all this quite PITA.

(One example in orbment was that I wanted to just ignore all requests from clients how modals should be positioned and just tile them on top of parent window)

Drakulix commented 7 years ago

As far I see there are currently already some difficulties with the many different kinds of popups / tooltips in wayland / xwayland land which either show correctly or not, and even now WM needs to check view types and all that shit to make it behave "correctly".

This comes mostly down to bad positioning. All compositors most likely want them to behave "correctly" instead of tiling them or similar. Maybe wlc should provide at least some help here (maybe some helper function for calculating the recommended popup position?). Everybody ends up implementing the same shit otherwise.

One thing I can suggest though, is not to let wlc control any position automatically or so, that was something I did earlier and there still may be some leftovers and it was just can of worms. Its better just to give hints (if view / surface has any) to compositor and let it decide where stuff will eventually end up to.

That is a contradiction to calling my idea to be good, because that would mean the compositor is allowed to place the popup anyway, including outside of its parent.

Anyway I am fine with any any idea you propose to be the "right" one for wlc. I am just sick of GTK apps crashing or being unusable and want to do something about it without wasting my time, because you might not like the API I am proposing. That is why I want to talk about all that stuff before hand. :)

I suppose we could just expose the positioner properties through new view functions and not handle it in wlc at all. That should be pretty easy to do in @kozec 's fork

kozec commented 7 years ago

Sooo... http://i.imgur.com/iJVmx8S.png

I added two APIs, wlc_view_get_requested_size and wlc_view_get_requested_anchor_rect. Positioner now remembers everything, but rest is not accessible so far.

There is just one problem left. zxdg_popup_v6_send_configure is called before request_geometry_cb, what breaks GTK (again). It means that menu gets zxdg_popup_v6_send_configure with zero size first and configured, correct sizes later, but GTK apparently uses only first recieved values.

I added one ugly hack to prevent that, but someone with better idea about how are callbacks called may be able to fix problem entirely by calling request_geometry_cb first.

Drakulix commented 7 years ago

I added two APIs, wlc_view_get_requested_size and wlc_view_get_requested_anchor_rect. Positioner now remembers everything, but rest is not accessible so far.

What exactly is requested size? Maybe we should expose min and max_size instead. If those are identical the requested size is quite obvious for the compositor.

I would call wlc_view_get_requested_anchor_rect just wlc_view_get_anchor_rect. The protocol makes it a hard constraint, its just up to the compositor to ignore that.

The rest should also be accessible to make correct positioning possible.

kozec commented 7 years ago

What exactly is requested size? Maybe we should expose min and max_size instead. If those are identical the requested size is quite obvious for the compositor.

requested size is what positioner.set_size defines in specs. I don't think there is any min and max size for positioner. Basically, it's size that client expect to get, menu window size in case I'm working with.

I would call wlc_view_get_requested_anchor_rect just wlc_view_get_anchor_rect. The protocol makes it a hard constraint, its just up to the compositor to ignore that.

No problem with that, but positioner contains "requests" for position, size, offset, gravity, anchor and constraint_adjustments, so I thought having common prefix for all of that would look less messy. I'm sure that at least view_get_size and view_get_position already exists.

By the way, how are you exposing wayland enums in API? What should return wlc_view_get_anchor, internally represented as zxdg_positioner_v6_anchor ?

Drakulix commented 7 years ago

No problem with that, but positioner contains "requests" for position, size, offset, gravity, anchor and constraint_adjustments

Good point. The name just clutters with the wlc_view_requests_geometry_cb and the other callbacks a like. So I think those should be named differently

kozec commented 7 years ago

Ok, how about wlc_view_get_positioner_anchor_rect, wlc_view_get_positioner_size etc?

There is also possibility to create just wlc_view_get_positioner_data and return this entire thing.

Cloudef commented 7 years ago

That is a contradiction to calling my idea to be good, because that would mean the compositor is allowed to place the popup anyway, including outside of its parent.

You are right, I misunderstood your original post as the anchor rect getter being more like request. Should have read the first words more properly.

Cloudef commented 7 years ago

By the way, how are you exposing wayland enums in API? What should return wlc_view_get_anchor, internally represented as zxdg_positioner_v6_anchor ?

The enums are wrapped in wlc. Only time developer has to know anything about the wayland headers (or the extensions) is when he/she is about to use the wlc-wayland.h and implement wayland extension.

kozec commented 7 years ago

The enums are wrapped in wlc. Only time developer has to know anything about the wayland headers (or the extensions) is when he/she is about to use the wlc-wayland.h and implement wayland extension.

Heh. This is going to be fun.

Okay, and is there any problem with having one getter for entire positioner structure?

Cloudef commented 7 years ago

Okay, and is there any problem with having one getter for entire positioner structure?

Well, if you want one, then that would be ABI stability. If the struct contents change often it's going to cause version incompatiblity.

kozec commented 7 years ago

If the struct contents change often

What's something that depends on changes in Wayland. Got it :(

Drakulix commented 7 years ago

Well, isn't the API expected to be adjusted to breaking protocol changes anyway? Keeping backwards compatibility at any price for unstable protocols is not helpful either.

Also additions to the struct are not breaking changes. Like the xdg-shell v6 changes, that mostly introduced new features.

Cloudef commented 7 years ago

Just saying, wlc isn't marked as stable either. Whether you want to keep the api opaque or not is up to you.

ddevault commented 7 years ago

For the sake of usability it makes sense to support both at the same time imo. And I don't imagine that codewise that would be too terribly difficult tbh.

ddevault commented 7 years ago

Only because not everyone has switched to v6, by the way. I would wager about half are still on 5.

kozec commented 7 years ago

To make that decision easier, I already have all _wlc_view_positionerget_* implemented and now I'm solving fact that for compositor to be able to compute position, wlc needs to store popup parent.

kozec commented 7 years ago

Here. With above, GTK menus are shown without crashing and compositor can display them correctly positioned.

Cloudef commented 7 years ago

Views already have parent / child relationship, do the popups really need own?

kozec commented 7 years ago

Views already have parent / child relationship, do the popups really need own?

Nope, I was not aware of that.

//edit: Anyway, that only means that last commit is not needed, just using view_get_parent works. I can tidy rest up and make PR.

Fale commented 7 years ago

A Fedora user is experiencing the same bug as for https://bugzilla.redhat.com/show_bug.cgi?id=1410900. (I'm reporting this to keep all the info about this linked)

kamiyaa commented 7 years ago

I'm having the same issue with gtk+ 3.22.x. Downgrading to gtk+ 3.20.x seems to fix the problem, though gtk+ 3.20 has problems of its own as well.