ArcticaProject / nx-libs

nx-libs
Other
120 stars 39 forks source link

Rootless mode: Qt applications do not exit #1065

Open xavierog opened 1 year ago

xavierog commented 1 year ago

This looks like a Qt bug, but since it only happens with nxagent -R and not with nxagent, I would like to discuss it here before reporting the issue to the Qt organisation.

What is happening? It seems Qt applications work fine with nxagent as a nested X server, with or without the -R (rootless) option... until they need to exit, at which point their windows disappear but the process seems to hang, possibly waiting for something from nxagent.

How to reproduce?

A small "hello" button should appear. Clicking it is supposed to make the application exit. It works fine in normal mode but in rootless mode, the process will not exit.

The application outputs some warnings on stderr right after it starts (and long before it even tries to exit). So far there is no reason to believe they are related with this issue.

qt.qpa.xcb: X server does not support XInput 2
QStandardPaths: XDG_RUNTIME_DIR not set, defaulting to '/tmp/runtime-username'
qt.qpa.gl: QXcbConnection: Failed to initialize GLX
qt.qpa.xcb: QXcbConnection: XCB error: 1 (BadRequest), sequence: 169, resource id: 553, major code: 130 (Unknown), minor code: 47

A brief inspection of the stacks hints that thread 1 is waiting for thread 2 and thread 2 is waiting for something from the X server:

Backtrace for thread 1 ``` (gdb) bt #0 __futex_abstimed_wait_common64 (private=0, cancel=true, abstime=0x0, op=393, expected=0, futex_word=0x55fd549cb500) at ./nptl/futex-internal.c:57 #1 __futex_abstimed_wait_common (futex_word=futex_word@entry=0x55fd549cb500, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0, cancel=cancel@entry=true) at ./nptl/futex-internal.c:87 #2 0x00007fc6d9ea31bb in __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x55fd549cb500, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=private@entry=0) at ./nptl/futex-internal.c:139 #3 0x00007fc6d9ea5818 in __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x55fd549cb4b0, cond=0x55fd549cb4d8) at ./nptl/pthread_cond_wait.c:503 #4 ___pthread_cond_wait (cond=0x55fd549cb4d8, mutex=0x55fd549cb4b0) at ./nptl/pthread_cond_wait.c:618 #5 0x00007fc6da4dfbfb in QWaitConditionPrivate::wait (deadline=..., this=0x55fd549cb4b0) at thread/qwaitcondition_unix.cpp:146 #6 QWaitCondition::wait (this=this@entry=0x55fd549d59d8, mutex=mutex@entry=0x55fd549d59b8, deadline=...) at thread/qwaitcondition_unix.cpp:225 #7 0x00007fc6da4d9cf9 in QThread::wait (this=this@entry=0x55fd549cc680, deadline=...) at ../../include/QtCore/../../src/corelib/thread/qmutex.h:270 #8 0x00007fc6d6ceceb9 in QXcbEventQueue::~QXcbEventQueue (this=0x55fd549cc680, __in_chrg=) at ./src/plugins/platforms/xcb/qxcbeventqueue.cpp:105 #9 0x00007fc6d6cecf19 in QXcbEventQueue::~QXcbEventQueue (this=0x55fd549cc680, __in_chrg=) at ./src/plugins/platforms/xcb/qxcbeventqueue.cpp:116 #10 0x00007fc6d6cc82f5 in QXcbConnection::~QXcbConnection (this=0x55fd549e0770, __in_chrg=) at ./src/plugins/platforms/xcb/qxcbconnection.cpp:143 #11 0x00007fc6d6cc8b49 in QXcbConnection::~QXcbConnection (this=0x55fd549e0770, __in_chrg=) at ./src/plugins/platforms/xcb/qxcbconnection.cpp:155 #12 0x00007fc6d6cc97be in qDeleteAll::const_iterator> (end=..., begin=...) at ../../../../include/QtCore/../../src/corelib/tools/qalgorithms.h:320 #13 qDeleteAll > (c=...) at ../../../../include/QtCore/../../src/corelib/tools/qalgorithms.h:328 #14 QXcbIntegration::~QXcbIntegration (this=0x55fd549d5d60, __in_chrg=) at ./src/plugins/platforms/xcb/qxcbintegration.cpp:227 #15 0x00007fc6d6cc98b9 in QXcbIntegration::~QXcbIntegration (this=0x55fd549d5d60, __in_chrg=) at ./src/plugins/platforms/xcb/qxcbintegration.cpp:229 #16 0x00007fc6d9731027 in QGuiApplicationPrivate::~QGuiApplicationPrivate (this=0x55fd549cb9c0, __in_chrg=) at kernel/qguiapplication.cpp:1724 #17 0x00007fc6dab61859 in QApplicationPrivate::~QApplicationPrivate (this=0x55fd549cb9c0, __in_chrg=) at kernel/qapplication.cpp:163 #18 0x000055fd53cc42f6 in main (argc=1, argv=0x7ffdbff8ebe8) at main.cpp:9 ```
Backtrace for thread 2 ``` (gdb) thread 2 [Switching to thread 2 (Thread 0x7fc6d60b96c0 (LWP 594953))] (gdb) bt #0 0x00007fc6d9f199ef in __GI___poll (fds=fds@entry=0x7fc6d60b8cc8, nfds=nfds@entry=1, timeout=timeout@entry=-1) at ../sysdeps/unix/sysv/linux/poll.c:29 #1 0x00007fc6d8e41d12 in poll (__timeout=-1, __nfds=1, __fds=0x7fc6d60b8cc8) at /usr/include/x86_64-linux-gnu/bits/poll2.h:47 #2 _xcb_conn_wait (c=c@entry=0x55fd549e1f10, cond=cond@entry=0x55fd549e1f50, vector=vector@entry=0x0, count=count@entry=0x0) at ../../src/xcb_conn.c:508 #3 0x00007fc6d8e4407a in xcb_wait_for_event (c=0x55fd549e1f10) at ../../src/xcb_in.c:703 #4 0x00007fc6d6cec840 in QXcbEventQueue::run (this=0x55fd549cc680) at ./src/plugins/platforms/xcb/qxcbeventqueue.cpp:228 #5 0x00007fc6da4d9ffd in operator() (__closure=) at thread/qthread_unix.cpp:351 #6 (anonymous namespace)::terminate_on_exception > (t=...) at thread/qthread_unix.cpp:288 #7 QThreadPrivate::start (arg=0x55fd549cc680) at thread/qthread_unix.cpp:311 #8 0x00007fc6d9ea63ec in start_thread (arg=) at ./nptl/pthread_create.c:444 #9 0x00007fc6d9f26a1c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81 ```

Questions:

Ionic commented 1 year ago

You probably will need to look at Qt code.

It looks like the second thread is an X11 event loop, which isn't weird at all. It also doesn't mean that it's actually waiting for a specific event (I guess?), it's just part of the normal X11 event handling. Now, obviously, this thread should terminate once you try to quit an application, but why that is not happening beats me without looking at Qt code.

xavierog commented 1 year ago

Thank you for this first (and extremely fast) feedback :)

Ionic commented 1 year ago

Looking around, it seems that quite some people have had issues with newer Qt versions and more exotic X servers (mostly on Windows, like XMing), which lead to the application window not showing up at all. They also got a BadRequest with codes {130,47}, but that might be a red herring.

The major code should be related to the MIT-SHM/XShm extension, minor code 47 is... I don't know. X_ShmPutImage should be 3, which is usually failing, I'm having a hard time finding out what 47 is supposed to be. It's probably something that we don't provide in NX, so it makes sense that Qt gets a BadRequest.

But if you don't see this message right when quitting applications, but rather right at the start, it probably means that it has nothing todo with the quitting issue.

Ionic commented 1 year ago

Weird stuff indeed. There is no minor opcode 47 in https://gitlab.freedesktop.org/xorg/proto/xorgproto/-/blob/master/include/X11/extensions/shmproto.h?ref_type=heads , which makes the BadRequest response very understandable... now... what did Qt try to call exactly?

The XInput2 extension does have a minor opcode of 47, which would be XIQueryVersion: https://gitlab.freedesktop.org/xorg/proto/xorgproto/-/blob/master/include/X11/extensions/XI2proto.h?ref_type=heads#L82

I think we're not providing XInput2 at all in NX.

The combination used here doesn't make sense, though...

xavierog commented 1 year ago

that might be a red herring.

Yes, most likely.

But if you don't see this message right when quitting applications, but rather right at the start, it probably means that it has nothing todo with the quitting issue.

I mentioned these warnings for the sake of completeness but, indeed, I should have mentioned they appear right at the start. I am going to adjust my report accordingly.

I think we're not providing XInput2 at all in NX.

Confirmed, I remember having checked that point.

uli42 commented 1 year ago

Unfortunately I am having no working environment ATM where I could test. So my first suggested is to retry with some extensions disabled. Please check -extension and -noshm and the like in nxagent manage.

Nevertheless for me it looks like a bug in qt appearing in a code path that is not run on "normal" X servers providing all current extensions.

Uli

Xavier G. @.***> schrieb am So., 10. Sept. 2023, 07:53:

that might be a red herring.

Yes, most likely.

But if you don't see this message right when quitting applications, but rather right at the start, it probably means that it has nothing todo with the quitting issue.

I mentioned these warnings for the sake of completeness but, indeed, I should have mentioned they appear right at the start. I am going to adjust my report accordingly.

I think we're not providing XInput2 at all in NX.

Confirmed, I remember having checked that point.

— Reply to this email directly, view it on GitHub https://github.com/ArcticaProject/nx-libs/issues/1065#issuecomment-1712722970, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABQHBZEA3TZIXOTJH4ER7O3XZVIXDANCNFSM6AAAAAA4R4ADGQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>

xavierog commented 1 year ago

I captured the X11 traffic between qtdeadsimple and nxagent. When Qt wishes to close the connection, it emits a SendEvent request along with a ClientMessage, the type of which is "not a predefined atom" (according to Wireshark) but most likely the actual value of QXcbAtom::_QT_CLOSE_CONNECTION: https://code.qt.io/cgit/qt/qtbase.git/tree/src/plugins/platforms/xcb/qxcbeventqueue.cpp#n348 It then waits for the associated Sent-ClientMessage event before cleaning up the place (CloseFont, FreeCursor, FreeColormap, FreeGC) and closing the underlying connection. This works fine in regular mode. However, in rootless mode, something prevents nxagent from emitting this Sent-ClientMessage. Next step: investigate why nxagent behaves differently in rootless mode.

Ionic commented 1 year ago

The most visible difference between "normal" (desktop) mode and rootless mode is that in the first case, a window manager is running within the session, while in the latter case the windows are decorated through an external window manager. Maybe atoms are not exchanged correctly.

uli42 commented 1 year ago

In that case you should get the same problem when running on a normal X server without window manager. Can you confirm?

Having written that something comes to my mind: In order to implement rootless mode NX needs to map atoms between the local xserver and nxagent. At this point some atoms are filtered and never mapped, partly for performance reasons, partly because they would cause problems. You could try and compare xlsatoms inside and outside the session if there are any differences regarding Q* atoms.

nxagent can be built using debug flags in Rootless.c. That would show any atom mappings going on, which might shine some more light on the issue.

Uli

On Thu, Sep 14, 2023 at 6:07 AM Mihai Moldovan @.***> wrote:

The most visible difference between "normal" (desktop) mode and rootless mode is that in the first case, a window manager is running within the session, while in the latter case the windows are decorated through an external window manager. Maybe atoms are not exchanged correctly.

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.Message ID: @.***>

xavierog commented 1 year ago

In that case you should get the same problem when running on a normal X server without window manager. Can you confirm?

I killed my window manager and ran qtdeadsimple against my X.org server: it exited as soon as I pressed the button, i.e. it behaved normally.

Having written that something comes to my mind: [...]

Thanks for the hint, I will dive into all of this until I reach a diagnosis.

xavierog commented 1 year ago

According to Atom map in context [nxagentResetAtomMap: Exiting] (and other calls to nxagentPrintAtomMapInfo()), _QT_CLOSE_CONNECTION never enters the atom map despite existing (with different values) in both local and remote servers. However, I believe this is a consequence, not a cause.

So I got back to my previous reasoning, which is: in rootless mode, the Qt-issued SendEvent X11 request with a custom-type ClientMessage is seemingly ignored. Based on the function name, I assumed SendEvent requests are processed by the ProcSendEvent function:

https://github.com/ArcticaProject/nx-libs/blob/3ef7845746c4ec1ac75825ccebc17168e0400cfa/nx-X11/programs/Xserver/hw/nxagent/NXevents.c#L419-L444

In Events.c, ForwardClientMessage acts only if: https://github.com/ArcticaProject/nx-libs/blob/3ef7845746c4ec1ac75825ccebc17168e0400cfa/nx-X11/programs/Xserver/hw/nxagent/Events.c#L4502-L4510 ... which explains why nxagent ignores Qt's SendEvent with a _QT_CLOSE_CONNECTION ClientMessage only in rootless mode (and only #ifdef NXAGENT_CLIPBOARD).

I tested the following quick-and-dirty workaround:

--- nx-X11/programs/Xserver/hw/nxagent/NXevents.c
+++ nx-X11/programs/Xserver/hw/nxagent/NXevents.c
@@ -144,6 +144,7 @@ of the copyright holder.
 #include "Windows.h"
 #include "Args.h"
 #include "Clipboard.h"
+#include "Atoms.h"

 extern Display *nxagentDisplay;

@@ -427,8 +428,11 @@ ProcSendEvent(ClientPtr client)

     if (nxagentOption(Rootless) && stuff->event.u.u.type == ClientMessage)
     {
-        ForwardClientMessage(client, stuff);
-        return Success;
+        Atom qt_close_connection = MakeAtom("_QT_CLOSE_CONNECTION", strlen("_QT_CLOSE_CONNECTION"), False);
+        if (stuff->event.u.clientMessage.u.l.type != qt_close_connection) {
+            ForwardClientMessage(client, stuff);
+            return Success;
+        }
     }

     if (stuff -> event.u.u.type == SelectionNotify)

... and Qt applications now exit as expected in rootless mode. Wireshark confirms the expected Sent-ClientMessage Event was issued by nxagent and received by the Qt Application.

At this stage, I believe we have a complete diagnosis. Next step: discuss how to best tackle the issue.

uli42 commented 1 year ago

Great, thank you for digging deeper! That finding matches my suspicion. However, while this fixes your specific issue we need to find out if there are more exceptions required...

Uli

Xavier G. @.***> schrieb am Sa., 23. Sept. 2023, 19:12:

According to Atom map in context [nxagentResetAtomMap: Exiting] (and other calls to nxagentPrintAtomMapInfo()), _QT_CLOSE_CONNECTION never enters the atom map despite existing (with different values) in both local and remote servers. However, I believe this is a consequence, not a cause.

So I got back to my previous reasoning, which is: in rootless mode, the Qt-issued SendEvent X11 request with a custom-type ClientMessage is seemingly ignored. Based on the function name, I assumed SendEvent requests are processed by the ProcSendEvent function:

https://github.com/ArcticaProject/nx-libs/blob/3ef7845746c4ec1ac75825ccebc17168e0400cfa/nx-X11/programs/Xserver/hw/nxagent/NXevents.c#L419-L444

In Events.c, ForwardClientMessage acts only if:

https://github.com/ArcticaProject/nx-libs/blob/3ef7845746c4ec1ac75825ccebc17168e0400cfa/nx-X11/programs/Xserver/hw/nxagent/Events.c#L4502-L4510 ... which explains why nxagent ignores Qt's SendEvent with a _QT_CLOSE_CONNECTION ClientMessage.

I tested the following quick-and-dirty workaround:

--- nx-X11/programs/Xserver/hw/nxagent/NXevents.c+++ nx-X11/programs/Xserver/hw/nxagent/NXevents.c@@ -144,6 +144,7 @@ of the copyright holder.

include "Windows.h"

include "Args.h"

include "Clipboard.h"+#include "Atoms.h"

extern Display *nxagentDisplay; @@ -427,8 +428,11 @@ ProcSendEvent(ClientPtr client)

 if (nxagentOption(Rootless) && stuff->event.u.u.type == ClientMessage)
 {-        ForwardClientMessage(client, stuff);-        return Success;+        Atom qt_close_connection = MakeAtom("_QT_CLOSE_CONNECTION", strlen("_QT_CLOSE_CONNECTION"), False);+        if (stuff->event.u.clientMessage.u.l.type != qt_close_connection) {+            ForwardClientMessage(client, stuff);+            return Success;+        }
 }

 if (stuff -> event.u.u.type == SelectionNotify)

... and Qt applications now exit as expected in rootless mode. Wireshark confirms the expected Sent-ClientMessage Event was issued by nxagent and received by the Qt Application.

At this stage, I belive we have a complete diagnosis. Next step: discuss how to best tackle the issue.

— Reply to this email directly, view it on GitHub https://github.com/ArcticaProject/nx-libs/issues/1065#issuecomment-1732367361, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABQHBZB6GOI6WBHC2VFBH2DX34J77ANCNFSM6AAAAAA4R4ADGQ . You are receiving this because you commented.Message ID: @.***>

xavierog commented 1 year ago

Here is a different fix: instead of systematically returning, ProcSendEvent() returns only if ForwardClientMessage() acknowledges it did something (or tried to do something) with the message.

--- nx-X11/programs/Xserver/hw/nxagent/Events.h
+++ nx-X11/programs/Xserver/hw/nxagent/Events.h
@@ -222,6 +222,6 @@ int nxagentPendingEvents(Display *dpy);

 int nxagentWaitEvents(Display *, useconds_t msec);

-void ForwardClientMessage(ClientPtr client, xSendEventReq *stuff);
+int ForwardClientMessage(ClientPtr client, xSendEventReq *stuff);

 #endif /* __Events_H__ */
--- nx-X11/programs/Xserver/hw/nxagent/Events.c
+++ nx-X11/programs/Xserver/hw/nxagent/Events.c
@@ -4499,7 +4499,7 @@ int nxagentWaitEvents(Display *dpy, usec
   return 1;
 }

-void ForwardClientMessage(ClientPtr client, xSendEventReq *stuff)
+int ForwardClientMessage(ClientPtr client, xSendEventReq *stuff)
 {
     Atom netwmstate = MakeAtom("_NET_WM_STATE", strlen("_NET_WM_STATE"), False);
     Atom wmchangestate = MakeAtom("WM_CHANGE_STATE", strlen("WM_CHANGE_STATE"), False);
@@ -4542,7 +4542,7 @@ void ForwardClientMessage(ClientPtr clie
                 #endif
             }
             else
-                return; // ERROR!
+                return -1; // ERROR!

             #ifdef DEBUG
             fprintf(stderr, "%s: window [0x%lx]\n", __func__, X.xclient.window);
@@ -4562,8 +4562,10 @@ void ForwardClientMessage(ClientPtr clie
             fprintf(stderr, "%s: send to window [0x%lx]\n", __func__, dest);
             fprintf(stderr, "%s: return Status [%d]\n", __func__, stat);
             #endif
+            return 1; // Processed/Forwarded/Sent
         }
     }
+    return 0; // Ignored
 }

 #ifdef NX_DEBUG_INPUT
--- nx-X11/programs/Xserver/hw/nxagent/NXevents.c
+++ nx-X11/programs/Xserver/hw/nxagent/NXevents.c
@@ -427,8 +427,8 @@ ProcSendEvent(ClientPtr client)

     if (nxagentOption(Rootless) && stuff->event.u.u.type == ClientMessage)
     {
-        ForwardClientMessage(client, stuff);
-        return Success;
+        if (ForwardClientMessage(client, stuff) != 0)
+            return Success;
     }

     if (stuff -> event.u.u.type == SelectionNotify)
uli42 commented 1 year ago

Thanks. I have broken equipment ATM so I cannot have a closer look. But I will as soon as possible!

Uli

On Thu, Sep 28, 2023 at 1:51 AM Xavier G. @.***> wrote:

Here is a different fix: instead of systematically returning, ProcSendEvent() returns only if ForwardClientMessage() acknowledges it did something (or tried to do something) with the message.

--- nx-X11/programs/Xserver/hw/nxagent/Events.h+++ nx-X11/programs/Xserver/hw/nxagent/Events.h@@ -222,6 +222,6 @@ int nxagentPendingEvents(Display *dpy);

int nxagentWaitEvents(Display , useconds_t msec); -void ForwardClientMessage(ClientPtr client, xSendEventReq stuff);+int ForwardClientMessage(ClientPtr client, xSendEventReq *stuff);

endif / __Events_H__ /--- nx-X11/programs/Xserver/hw/nxagent/Events.c+++ nx-X11/programs/Xserver/hw/nxagent/Events.c@@ -4499,7 +4499,7 @@ int nxagentWaitEvents(Display *dpy, usec

return 1; } -void ForwardClientMessage(ClientPtr client, xSendEventReq stuff)+int ForwardClientMessage(ClientPtr client, xSendEventReq stuff) { Atom netwmstate = MakeAtom("_NET_WM_STATE", strlen("_NET_WM_STATE"), False); Atom wmchangestate = MakeAtom("WM_CHANGE_STATE", strlen("WM_CHANGE_STATE"), False);@@ -4542,7 +4542,7 @@ void ForwardClientMessage(ClientPtr clie

endif

         }
         else-                return; // ERROR!+                return -1; // ERROR!

         #ifdef DEBUG
         fprintf(stderr, "%s: window [0x%lx]\n", __func__, X.xclient.window);@@ -4562,8 +4562,10 @@ void ForwardClientMessage(ClientPtr clie
         fprintf(stderr, "%s: send to window [0x%lx]\n", __func__, dest);
         fprintf(stderr, "%s: return Status [%d]\n", __func__, stat);
         #endif+            return 1; // Processed/Forwarded/Sent
     }
 }+    return 0; // Ignored

}

ifdef NX_DEBUG_INPUT--- nx-X11/programs/Xserver/hw/nxagent/NXevents.c+++ nx-X11/programs/Xserver/hw/nxagent/NXevents.c@@ -427,8 +427,8 @@ ProcSendEvent(ClientPtr client)

 if (nxagentOption(Rootless) && stuff->event.u.u.type == ClientMessage)
 {-        ForwardClientMessage(client, stuff);-        return Success;+        if (ForwardClientMessage(client, stuff) != 0)+            return Success;
 }

 if (stuff -> event.u.u.type == SelectionNotify)

— Reply to this email directly, view it on GitHub https://github.com/ArcticaProject/nx-libs/issues/1065#issuecomment-1738252019, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABQHBZA47DCQ6OBBIH7AC7TX4S3XTANCNFSM6AAAAAA4R4ADGQ . You are receiving this because you commented.Message ID: @.***>

uli42 commented 5 months ago

The problem I just mentioned in #1015 might be related to this. We both issues should be fixable in one go.