brainvisa / casa-distro

Unified development environment for BrainVISA projects.
2 stars 1 forks source link

Qt applications crash under X2Go in the casa-run-5.3 image #321

Closed ylep closed 1 year ago

ylep commented 1 year ago

Describe the bug Qt applications crash under X2Go in the casa-run-5.3 image, with error messages about xcb.

qt.qpa.gl: QXcbConnection: Failed to initialize GLX
[...]
qt.qpa.xcb: QXcbConnection: XCB error: 1 (BadRequest), sequence: 169, resource id: 408, major code: 130 (Unknown), minor code: 47

To Reproduce Steps to reproduce the behavior:

  1. Open an X2Go session
  2. Launch any Qt programs: anatomist, brainvisa, even bv
  3. Resize the window a few times to trigger refresh events, soon you the app will freeze and even the whole X2Go session may stop responding, partially or totally.

Environment:

Additional context The issue appears because X2Go uses a very outdated code base of the X server. I will try to implement the workaround described here: https://wiki.x2go.org/doku.php/wiki:development:glx-xlib-workaround which actually implies reincorporating a software-rendering libGL in the image.

ylep commented 1 year ago

I tried reusing the old libGL.so that we had in the casa-run-5.0 image , with mixed results:

I will now try to build a libGL within the casa-dev-5.3 image to see if it gives better results.

ylep commented 1 year ago

I get the same behaviour with an up-to-date libGL (MESA 20.0.5) built under Ubuntu 22.04: the error messages persist:

qt.qpa.gl: QXcbConnection: Failed to initialize GLX
qt.qpa.xcb: QXcbConnection: XCB error: 1 (BadRequest), sequence: 169, resource id: 393, major code: 130 (Unknown), minor code: 47
qt.qpa.xcb: QXcbConnection: XCB error: 3 (BadWindow), sequence: 16503, resource id: 8406920, major code: 40 (TranslateCoords), minor code: 0
qt.qpa.xcb: QXcbConnection: XCB error: 3 (BadWindow), sequence: 16647, resource id: 8407011, major code: 40 (TranslateCoords), minor code: 0
ylep commented 1 year ago

Now I just realized that one of my X2Go servers crashes, but not the other. I have to redo all tests, which will have to wait until tomorrow.

Both servers are running the same Ubuntu 18.04 and the same MATE desktop environment, but they have different versions of x2goserver, different graphics cards and different NVidia drivers:

ylep commented 1 year ago

Things are starting to get clearer: the crash is related to the version of X2Go. x2goserver-4.0.0.0-3 exhibits the crash, x2goserver-4.1.0.3-0~1708~ubuntu18.04.1 does not.

Interestingly, the error messages persist under the more recent X2Go, even though there is no crash. Also, the glxinfo and glxgears fail under both versions of X2Go with opengl=container or opengl=nv:

qt.qpa.gl: QXcbConnection: Failed to initialize GLX
qt.qpa.xcb: QXcbConnection: XCB error: 1 (BadRequest), sequence: 169, resource id: 393, major code: 130 (Unknown), minor code: 47
$ glxinfo
name of display: :50
X Error of failed request:  GLXUnsupportedPrivateRequest
  Major opcode of failed request:  143 (GLX)
  Minor opcode of failed request:  17 (X_GLXVendorPrivateWithReply)
  Serial number of failed request:  27
  Current serial number in output stream:  27
$ glxgears
X Error of failed request:  GLXUnsupportedPrivateRequest
  Major opcode of failed request:  143 (GLX)
  Minor opcode of failed request:  17 (X_GLXVendorPrivateWithReply)
  Serial number of failed request:  32
  Current serial number in output stream:  32

With a software-only libGL

Using a software-only libGL as in https://github.com/brainvisa/casa-distro/pull/322:

Summary

denisri commented 1 year ago

I agree.

ylep commented 1 year ago

Software-only OpenGL has been pushed in casa-run-5.3-13.sif and casa-dev-5.3-14.sif.

Here are my next steps for solving this issue:

denisri commented 1 year ago

OK if the test is fast. In many situations we run these containers many times for quite short tasks so the OpenGL test should not take a noticeable time (or be done only in graphical sessions where a X server is connected, or something else).

ylep commented 1 year ago

glxinfo is a really slow command... a quick test gives about 500 ms of overhead (whereas the current overhead for running a no-op like bv true on my machine is 421 ms). That overhead may be acceptable for launching a full-fledged GUI (brainvisa or anatomist) but it is too much for simple command-line tools.

Unless we modify bv to e.g. set gui=True mode for GUI commands only, this is not really a practical solution.

sapetnioc commented 1 year ago

What do you think of an option such as gui=remote that would be like gui=True but launching glxinfo to avoid problems with x2go ?

denisri commented 1 year ago

It's possible but it would not be automatic, which would be almost the same as using opengl=software (thus a bit redundant). The question was rather if we could detect this situation automatically without the user bothering about it, but it seems to have a cost...

ylep commented 1 year ago

Considering that the glxinfo autodetection would only fix an issue with a really old version of X2Go (4.0), I don't think it is worth it. BrainVISA is not the only software to have problems under X2Go 4.0...