bbidulock / icewm

A window manager designed for speed, usability, and consistency
Other
570 stars 97 forks source link

Strange Firefox Behavior in IceWM #742

Closed freonheat closed 5 months ago

freonheat commented 11 months ago

This one is really strange and really difficult to describe, but I will try. We have encountered a problem with a web-based EMR in Firefox. It only happens in Firefox versions 110 and above, and only in IceWM, of any version (we have tested up to the latest, 3.4.1).

On this EMR web site (of which we are just users, not developers), it presents tables of results in many places through the "application". And the header row of the results has the names of the columns. If the mouse hovers over a column name, it presents what was a hidden down arrow. If you click on the arrow, it presents a drop down menu. Appears to be just normal JavaScript/html stuff. Screenshot of this menu: options_expected_behavior

When Firefox has focus and no other window has been chosen it works fine. But if I then click on any other application window, then click back on Firefox (which now has focus), the arrow will appear on hover, but clicking on it produces no menu. The only way to recover is to click on the taskbar button for Firefox (which minimizes it) and then again (to restore it) and then that function in Firefox in the EMR works again... until the user shifts focus to another window.

We can't find any other application which has this issue. We even tried on different Linux systems, also installing all kinds of desktop/window managers- KDE, LXDE, XFCE, Cinnamon, etc, no such problem in them, only when we use IceWM, even with all the defaults taken. We are utterly baffled by this. I don't even understand how it is possible that it is IceWM.

freonheat commented 11 months ago

Oh, one other tidbit, if it wasn't clear.... it doesn't matter where the Firefox is actually running, only where it is displayed. Inotherwords, Firefox doesn't have to be running on the machine with the Xserver and IceWM running. I can ssh into a remote machine and launch Firefox 115 (for example) on the remote machine and it displays on my local machine's Xserver. If the local machine is running IceWM, we l have the strange menu problem. If it is running something other than IceWM, the problem is not present. Just mind-bending!

freonheat commented 11 months ago

We discovered something else interesting. There is a problem in LibreOffice 7 when under IceWM (does not occur in LO 5 or low versions of 6). Open Writer, go to Tools> Autocorrect Options and under the Replacement Table tab, if that list is scrolled or you hover over buttons, the window continuously grows horizontally. Doesn't do this anywhere else I can find in LO, just that one dialog/tab. But if I display it remotely through ssh to a machine running a non-IceWM environment, the problem does not occur. Not sure if or how this might be related to the Firefox type problem, but I thought it relevant-enough to mention.

gijsbers commented 10 months ago

Very interesting. Is there perhaps a link to the EMR, so I can try to reproduce it? One question I have is if the menu is really a X11 window or just something internal to Firefox. If you can't provide a link, this is what I would do for a start to figure this out: run watch -d icesh -t -v list and then make the menu appear. If it is a X11 window, then I would be interested in its properties and how they change. Take note of the X11 handle of the menu and run something like this: icesh -w 0xe0010e properties spy and make the menu appear and disappear. Then change focus to some other app and back to FF and reproduce the problem.

freonheat commented 10 months ago

Very interesting. Is there perhaps a link to the EMR, so I can try to reproduce it?

Sorry, that, unfortunately, isn't possible. It is not a public site. I am sure it probably affects other sites that are public, as well, but as of yet we haven't found any. Although you might be able to reproduce the LibreOffice issue, even though it might not be related.

One question I have is if the menu is really a X11 window or just something internal to Firefox.

The menu is just part of the site rendering. Doesn't appear to be anything special. Just part of the html table. I might be able to provide some of the raw html, but it is a complex site and I am not sure that would help.

If you can't provide a link, this is what I would do for a start to figure this out: run watch -d icesh -t -v list and then make the menu appear. If it is a X11 window, then I would be interested in its properties and how they change. Take note of the X11 handle of the menu and run something like this: icesh -w 0xe0010e properties spy and make the menu appear and disappear. Then change focus to some other app and back to FF and reproduce the problem.

I will see what I can come up with. But I am pretty certain that menu is not an X11 window.

gijsbers commented 10 months ago

If it is not an X11 window, then it is an internal bug to Firefox and there is no sense to spend any more time here, but you could submit a bug at https://bugzilla.mozilla.org/. You can also try and see if you can reproduce it under the Fluxbox window manager, because it has some technical similarity to icewm.

gijsbers commented 10 months ago

I can't reproduce your LibreOffice issue. It might perhaps be related to the specific fonts used? Run icesh spy on that dialog while you repeat the problem.

freonheat commented 10 months ago

You can also try and see if you can reproduce it under the Fluxbox window manager, because it has some technical similarity to icewm.

Just tested. Works fine under Fluxbox as well. It is only when we use icewm that the problem occurs. I know- very very strange.

In icewm, when focus is returned to Firefox and the user hovers over the table column header, the down arrow appears, but clicking on the arrow is actually ignored and it sorts the column instead, which is the action that would happen if the user clicked anywhere on the header other than the down arrow.

freonheat commented 10 months ago

I can't reproduce your LibreOffice issue. It might perhaps be related to the specific fonts used? Run icesh spy on that dialog while you repeat the problem.

RE: the LibreOffice Autocorrect Options dialog: icesh spy returns these:

35:08.024: 0x2e00b87: WM_NORMAL_HINTS MinSize(1072,556) Base(0,0) 35:08.026: 0x2e00b87: Configure 0x2e00b87 1072x556+0+0 35:08.026: 0x2e00b87: Visibility PartiallyObscured 35:08.026: 0x2e00b87: Visibility Unobscured 35:08.027: 0x2e00b87: Configure 0x2e00b87 1072x556+650+237 Send 35:08.029: 0x2e00b87: Leave Normal Nonlinear Nofocus 35:08.046: 0x2e00b87: _NET_WM_OPAQUE_REGION = 0, 0, 1072, 556

each time the user hovers over any button, clicks in the list, or scrolls the mouse wheel. Each time, the window grows a little. I probably should have opened a different issue for this, since it is not related to Firefox. But it is similar that it only happens when run under IceWM and no other desktop/WM. lo_growing_dialog

freonheat commented 10 months ago

I probably should have opened a different issue for this, since it is not related to Firefox. But it is similar that it only happens when run under IceWM and no other desktop/WM.

OK, we have found part of the problem with the LibrOffice one. It only happens when GTK_OVERLAY_SCROLLING=0 is set. And I was wrong, we could not replicate the problem on other systems with IceWM. So you can scratch all that (I will hide the postings).

So it is back to the Firefox issue. That one is replicating on other systems.

gijsbers commented 10 months ago

I am interested to see a much longer icesh spy of that dialog with the previous nonzero setting of GTK_OVERLAY_SCROLLING. At least a hundred lines. That could in fact be something upon which I can improve.

Since sofar you haven't proven that the menu is a X11 window, icewm cannot be at fault in any way. A window manager only deals with toplevel X11 windows and from those only the ones which have the override_redirect bit set to False. If a menu is an X11 window, it typically has this flag set to True and a window manager will ignore it. I repeat my hint for you to submit a bug at Mozilla. This topic can't go anywhere here.

That something only occurs under icewm doesn't prove that icewm is at fault. I have argued and proven this so often before. Icewm takes standard conformance very seriously. Applications usually run with known and unknown bugs. Some of those are triggered when running under icewm, but it is still the applications bug, which must be reported to the application developers. It just wastes our time here. Feel free to make a donation to compensate for that.

freonheat commented 10 months ago

I believe we now know how it is possible that IceWM could be interacting with Firefox to cause our issue. I am far from certain it is the cause or explanation, or whether the problem is ultimately Firefox or not, but it is very interesting... IceWM is acting differently when mouse clicks are in a window that has focus gained by clicking on the window vs. focus gained by clicking on the taskbar icon.

If a window has focus by clicking on the taskbar, subsequent clicks inside that window have no activity in icewm. However, if a window has focus by first giving focus to a different window, then clicking inside the original window to re-gain focus, every single click inside the original window results in these two events on the down-click (and nothing on the click release):

Leave Grab Virtual Nofocus Enter Ungrab Virtual Nofocus

These extra events could be going to Firefox and changing the behavior on how it acts on that menu. Doesn't matter which application is used as an example, the extra events are happening this way on all applications. It is just that most applications don't seem to care.

A longer/full example follows:

=== Problem behavior demonstration = Move mouse over firefox window 38:30.325: 0x0a0002c: Enter Normal NonlinearVirtual Nofocus = Click column drop-down menu arrow on web application {web app fails desired behavior, menu never appears} 38:31.815: 0x0a0002c: Leave Grab Virtual Nofocus 38:31.825: 0x0a0002c: _NET_WM_STATE = _NET_WM_STATE_FOCUSED 38:31.832: 0x0a0002c: Focus Normal Nonlinear 38:31.832: 0x0a0002c: Enter Ungrab Virtual Focus 38:31.840: 0x0a0002c: WM_HINTS Input Normal Group(10485761) 38:31.843: 0x0a0002c: Defocus Normal Inferior = Click on it again 38:37.029: 0x0a0002c: Leave Grab Virtual Nofocus 38:37.030: 0x0a0002c: Enter Ungrab Virtual Nofocus = Click on it again 38:37.589: 0x0a0002c: Leave Grab Virtual Nofocus 38:37.590: 0x0a0002c: Enter Ungrab Virtual Nofocus

=== Non-problematic behavior demonstration = Move mouse to taskbar, then click taskbar firefox button 41:04.092: 0x0a0002c: _NET_WM_STATE = _NET_WM_STATE_FOCUSED 41:04.096: 0x0a0002c: Focus Normal Nonlinear 41:04.102: 0x0a0002c: WM_HINTS Input Normal Group(10485761) 41:04.102: 0x0a0002c: Defocus Normal Inferior = Move mouse over firefox window 41:06.950: 0x0a0002c: Enter Normal NonlinearVirtual Nofocus = Click column drop-down menu arrow on web application {web app acts as it should, showing the menu selected} <nothing at all returned by icesh, nor matter how many clicks>

Hopefully this makes sense. If not, I can try to create a video or something, if that would be helpful or wanted.

gijsbers commented 10 months ago

Nah, all totally irrelevant. I have become totally uninterested with discussing this problem any further.

The only thing I want from you is a long unedited icesh spy output on that LibreOffice 7 dialog with the previous nonzero setting of GTK_OVERLAY_SCROLLING. At least a hundred lines. That could in fact be something upon which I can improve.

freonheat commented 10 months ago

Nah, all totally irrelevant.

We are trying to understand more about IceWM and icesh spy. Could you please explain why a window that already has focus would cause "Leave Grab Virtual Nofocus" and "Enter Ungrab Virtual Nofocus" with every click inside the window while not moving outside the window?

a long unedited icesh spy output on that LibreOffice 7 dialog with the previous nonzero setting of GTK_OVERLAY_SCROLLING. At least a hundred lines.

OK, I will supply that shortly...

freonheat commented 10 months ago

OK, I will supply that shortly...

File attached. This is LibreOffice 7.5.5 with VCL=gtk3 with GTK_OVERLAY_SCROLLING=0 in Writer, Tools> Autocorrect> Autocorrect Options> Replace (tab) then used scrollwheel on mouse to scroll the list. Each movement of scroll causes the window to get larger and larger. icesh_out.txt

EDIT: I just realized that isn't with non-zero on GTK_OVERLAY_SCROLLING. I am not sure the purpose of supplying one unset, though, since the errant behavior didn't happen with GTK_OVERLAY_SCROLLING unset (and I have never used anything except unset and set to 0).

freonheat commented 10 months ago

We are trying to understand more about IceWM and icesh spy. Could you please explain why a window that already has focus would cause "Leave Grab Virtual Nofocus" and "Enter Ungrab Virtual Nofocus" with every click inside the window while not moving outside the window?

More experimentation and I discovered it also has to do with IceWM layers. If firefox (or any window) is on a higher or lower layer than the other windows, the problem does not occur (I don't see the "extra" events from IceWM). Also, no problem if window is selected from the window list (just like no problem when selecting window from taskbar).

gijsbers commented 10 months ago

Thanks a lot for your icesh_out.txt. It shows that LibreOffice constantly increases the WM_NORMAL_HINTS MinSize field to a value that is larger than the current dialog window dimension. That is a bug in LibreOffice, so I don't have to do anything about it.

freonheat commented 10 months ago

I finally found another place with similar focus behavior issues. Still not a public web site, but it is FOSS. It is inside the Xen Orchestra web panel- when trying to use a "console" for a VM, if the Firefox window is selected for focus by clicking in it after another window had focus, the "console" cannot gain keyboard control. But if I click on the taskbar button to gain focus for FF, then the "console" will work fine. Still researching.

freonheat commented 10 months ago

I finally regression tested this to Firefox 110 as the first version with the problem showing. Updated the first post to reflect that.

gijsbers commented 10 months ago

3 weeks ago I said to report it to mozilla, but you wouldn't listen. Your boss could have saved a lot of money. Next time start with a hefty donation. Much cheaper.

freonheat commented 10 months ago

3 weeks ago I said to report it to mozilla, but you wouldn't listen.

I haven't gathered enough information yet, was busy with other stuff. Plus, without some public site to point to, I am not sure it will get any attention yet. Unfortunately, still have found only two examples, one a private site, and the other something someone has to install.

I did discover right after my posting today, that one of my testing methods was wrong, and it is NOT specific to just IceWM, there are others. So that greatly expands the scope. Will update again with that info and a mozilla bugreport soon. Might help anyone else who runs across it.

Your boss could have saved a lot of money. Next time start with a hefty donation. Much cheaper.

My boss? Money? Not sure what you are talking about.

freonheat commented 10 months ago

Good news!

First, I found a public site that anyone can access that clearly illustrates the problem: https://www.truenas.com/docs/scale

Go there and if you select some other window, then firefox (by clicking in firefox) you can no longer use the "TrueNAS Scale" pulldown (or others) and successfully select an option from the menu. It just won't work. Only if you click on the taskbar button for Firefox to give it focus, will those menus work.

Then, after tons of searching, I finally found a relevant mozilla bug report: https://bugzilla.mozilla.org/show_bug.cgi?id=1831400

There is a workaround of setting about:config "widget.gtk.ignore-bogus-leave-notify" from the default of "2" to a "1" and that seems to solve the problem with those window managers that display the problem. I am not sure exactly what it does, but it works for firefox ( at least 115+) under IceWm.

Also interesting is this information: https://phabricator.services.mozilla.com/D174368 Where it appears people have been adding code to firefox to actually try to detect which window manager and change behavior based on that? Not clear what is going on.

So it still isn't apparent to me exactly what the bug actually is or in what. But it at least looks like there is a possible workaround?

freonheat commented 10 months ago

More info....

[Bug 1805939] - Ignore bogus leave-notify events on known-broken environments

https://hg.mozilla.org/integration/autoland/rev/382f0839b989

+// Some window managers send a bogus top-level leave-notify event on every
+// click. That confuses our event handling code in ways that can break websites,
+// see bug 1805939 for details.
+//
+// Make sure to only check this on bogus environments, since for environments
+// with CSD, gdk_device_get_window_at_position could return the window even when
+// the pointer is in the decoration area.
+static bool IsBogusLeaveNotifyEvent(GdkWindow* aWindow,
+                                    GdkEventCrossing* aEvent) {
+  static bool sBogusWm = [] {
+    if (GdkIsWaylandDisplay()) {
+      return false;
+    }
+    const auto& desktopEnv = GetDesktopEnvironmentIdentifier();
+    return desktopEnv.EqualsLiteral("fluxbox") ||   // Bug 1805939 comment 0.
+           desktopEnv.EqualsLiteral("blackbox") ||  // Bug 1805939 comment 32.
+           StringBeginsWith(desktopEnv, "fvwm"_ns);
+  }();
+
+  const bool shouldCheck = [] {
+    switch (StaticPrefs::widget_gtk_ignore_bogus_leave_notify()) {
+      case 0:
+        return false;
+      case 1:
+        return true;
+      default:
+        return sBogusWm;
+    }
+  }();
gijsbers commented 10 months ago

The leave notify events are due to button bindings installed by calls to XGrabButton. The various events are explained in https://tronche.com/gui/x/xlib/events/window-entry-exit/normal.html. Maybe the problem is that GTK is designed for Gnome which may not use XGrabButton. Then if non-Gnome applications use it, situations like this may occur. I fail to understand why you report this here. This forum is about IceWM development.

freonheat commented 10 months ago

Maybe the problem is that GTK is designed for Gnome which may not use XGrabButton. Then if non-Gnome applications use it, situations like this may occur.

Possible. But the problem doesn't exist in KDE or Cinnamon or several others I tried. Perhaps they are doing something similar to adjust. I don't know.

I fail to understand why you report this here. This forum is about IceWM development.

Because it is affecting apps running inside IceWM. Because Mozilla appears to be blaming it on window managers sending extra events. Those are the extra events I discovered and reported earlier in this thread:

"Could you please explain why a window that already has focus would cause "Leave Grab Virtual Nofocus" and "Enter Ungrab Virtual Nofocus" with every click inside the window while not moving outside the window?"

Perhaps that behavior was once normal and is becoming abnormal. I would check, but I don't know if I can test the unaffected window managers to see what they send because they might not have a nice utility like icewm spy.

Although all I have found so far is Firefox not liking the behavior, it is likely affecting or will start affecting other applications as well, ones that will not have a built-in workaround that can be set. Reading through all the various long Mozilla threads, it seems to be the behavior was brought in with GTK and showed up when they made a change to stop ignoring the extra events (around version 110, they dropped the old workaround, because it was causing problems with newer window managers didn't need it anymore). From a Mozilla developer:

_"It's a WM bug, effectively, see bug 1822911. The context is that we had code to work around these since forever, and that broke mouse leave on modern WMs, so when I removed it well... we found all these issues."_

For me, this isn't about blame (Firefox, GTK, IceWM), but understanding and accommodating. Would it make sense for iceWM to have an optional setting to suppress the extra events and act more like the window managers that [presumably] don't send the extra events? And if so, would that negatively affect any other applications?

gijsbers commented 9 months ago

The WMs don't send these events. It's the X11 server that sends them. You could ask the X11 server not to send them if you don't like them? Maybe they can add an option to the X11 server? But it really is a lack of understanding of the GTK developers how events and button bindings work and an unwillingness to report this to them.

gijsbers commented 9 months ago

Because it is affecting apps running inside IceWM.

No, it is affecting apps which are built with faulty libraries.

Because Mozilla appears to be blaming it on window managers sending extra events.

appears? WMs don't send these extra events. WMs use the well-defined X11 facilities that some developers fail to grasp.

freonheat commented 9 months ago

Although all I have found so far is Firefox not liking the behavior, it is likely affecting or will start affecting other applications as well, ones that will not have a built-in workaround that can be set.

Well, that didn't take long. Chromium/Chrome also have the same problem when run under IceWM, with apparently no workaround.

gijsbers commented 9 months ago

Maybe this?

ClientWindowMouseActions=0
FocusOnClickClient=0
RaiseOnClickClient=0
PetteriAimonen commented 6 months ago

@gijsbers I can confirm that setting all three settings fixes the problem, but unfortunately it also makes switching between windows very annoying.

An interesting datapoint is that running Firefox inside xpra somehow doesn't suffer from this problem.

Edit: Oh, and to aid reproduction: I'm getting this with the hover menus on https://eleshop.eu/ and https://digikey.com/ . First they work, then switch to another window and after that clicking on the hover menus immediately closes them and takes you to wrong place.

PetteriAimonen commented 6 months ago

Here are some more clues:

When I add a print to YFrameWindow::setWinFocus():

    fprintf(stderr, "setWinFocus canRaise:%d, overlapped:%d\n", (int)canRaise(), (int)overlapped());
    if (!raiseOnClickClient || !canRaise() || !overlapped())
        container()->releaseButtons();

it always prints canRaise:1, overlapped:1, except when restoring a minimized window when it prints 0 for both.

My interpretation is that for some reason the window is not yet really the topmost at the point when setWinFocus() gets called. It therefore leaves the grab in effect, expecting to raise the window when it gets clicked. But in reality it becomes the top window soon after.

I have tried mucking with the order of actions in YClientContainer::handleButton but it has no effect.

It appears that the window order in YList that is used to determine overlapped() result is only updated after FocusIn event is received in YWindow::handleEvent. But by that point setWinFocus() has already completed and the buttons were left grabbed.

As a workaround, removing the if condition if (!raiseOnClickClient || !canRaise() || !overlapped()) from wmframe.cc makes things work for me. It probably would cause problems in configurations where setting RaiseOnFocus is not active.

I'm not sure what the correct fix is, maybe the releaseButtons() test should be redone after FocusIn event is received?