cnr-isti-vclab / meshlab

The open source mesh processing system
http://www.meshlab.net
GNU General Public License v3.0
4.7k stars 821 forks source link

[Linux] MeshLab flatpak 2021.10 with Wayland enabled breaks application #1177

Open neildarlow opened 2 years ago

neildarlow commented 2 years ago

I have been asked by the flathub packagers to open an issue here relating to this flathub issue: https://github.com/flathub/net.meshlab.MeshLab/issues/24

Please ask for any additional information you require. I will hold-off updating my MeshLab flatpak so that I can perform any additional tests you may require.

christian-rauch commented 2 years ago

Can you share

  1. screenshots of the issue (There are screenshots in the flatpak issue, but I would like to have them here too for completeness. In the flatpak issue, the "background" seems to be black, so it does not appear as if the 3D view is transparent, just blank.)
  2. the output of inxi -GSC -xx to get more system information (driver, compositor, ...)

Thanks.

neildarlow commented 2 years ago

As I mentioned in the flathub issue, the screenshot doesn't show the background correctly. I've attached a photograph of the problem below.

Regarding using inxi, the flatpak environment doesn't contain that binary. I can run it on the host but that won't provide information on the flatpak runtimes used e.g. the org.kde.Platform version (5.15 which isn't the latest) just the ones required for the GNOME desktop.

meshlab-image

neildarlow commented 2 years ago

Here is the inxi output requested. As it's fedora Silverblue it will only be indicative of the host environment.

System: Host: vivobook Kernel: 5.15.10-200.fc35.x86_64 x86_64 bits: 64 compiler: gcc v: 2.37-10.fc35 Console: pty pts/0 wm: gnome-shell DM: GDM Distro: Fedora release 35 (Thirty Five) CPU: Info: Quad Core model: 11th Gen Intel Core i5-1135G7 bits: 64 type: MT MCP arch: Tiger Lake rev: 1 cache: L1: 320 KiB L2: 5 MiB L3: 8 MiB flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx bogomips: 38707 Speed: 465 MHz min/max: 400/4200 MHz Core speeds (MHz): 1: 465 2: 3378 3: 738 4: 2371 5: 2203 6: 3527 7: 2379 8: 767 Graphics: Device-1: Intel TigerLake-LP GT2 [Iris Xe Graphics] vendor: ASUSTeK driver: i915 v: kernel bus-ID: 0000:00:02.0 chip-ID: 8086:9a49 Device-2: IMC Networks USB2.0 HD UVC WebCam type: USB driver: uvcvideo bus-ID: 1-6:2 chip-ID: 13d3:56a2 Display: server: X.Org 1.21.1.4 compositor: gnome-shell driver: loaded: i915 note: n/a (using device driver) resolution: 1920x1080~60Hz s-dpi: 96 OpenGL: renderer: Mesa Intel Xe Graphics (TGL GT2) v: 4.6 Mesa 21.3.2 direct render: Yes

christian-rauch commented 2 years ago

indicative of the host environment

What is the host environment? Are you running the applications in a VM? Is the host environment the one that runs directly on the hardware ("bare metal")?

Did you run inxi -GSC -xx? Your output is missing some information that is shown on my system. E.g. your output is not showing Desktop:, Display: and driver:. Can you also format the output using markdown syntax for code? This makes it easier to read.

neildarlow commented 2 years ago

I am running fedora Workstation 35 Silverblue edition.

This is an immutable OS (everyone runs the same underlying image) which can be overlaid with RPMs (sparingly) and applications are installed as flatpaks.

The flatpaks are self-contained application images which use runtimes to provide common libraries etc. MeshLab is one such application and it runs in its own sandboxed environment i.e. it has its own /usr, /etc trees. The runtimes supply binaries in /usr/bin and executables from the flatpak reside in /app/bin - within the flatpak space.

You cannot envisage a flatpak execution environment like a traditional GNU/Linux installation. Likewise debugging a flatpak is unlike traditional methods. I think you will have to work alongside the flatpak packagers on this one. The end-user has little control over the execution environment other than being able to modify a few access permissions.

The inxi output I supplied was exactly what you asked for (even executed as root) on the host. As I mentioned, the flatpak runtimes contribute much additional software to the execution environment which you cannot observe at the underlying OS command line level. If you want inxi output from within the flatpak environment it will have to be bundled into it so it has access to /usr provided by the runtimes.

christian-rauch commented 2 years ago

As far as I understand, running inxi inside the flatpak / runtime does not make sense because this is the same runtime everyone with that MeshLab flatpak is using. That means that everyone is using the same GLEW, Qt, etc.

I was looking for differences outside the flatpak to see what might cause the different behaviour. If yours is showing the transparent 3D issue and mine is running correctly with the same flatpak, then this means it must be related to the driver and/or compositor (or something else?). I am using the Intel driver with an older Mesa 21.0.3 and an older GNOME Shell version. I would basically wait until someone else is facing the same issue. I will try with a new GNOME Shell and Mesa version once I upgrade to Ubuntu 22.04.

neildarlow commented 2 years ago

The flathub packagers have reverted the graphics interface from Wayland to X11 for the time being. They will try again with Wayland at the next GLEW update. The main thing is that everyone should be able to run MeshLab from flatpak now without this issue.

alemuntoni commented 2 years ago

Hi everybody, sorry for the late reply. I don't know if the flatpak of MeshLab uses the version of glew that we have bundled in meshlab, or the one of another package. I am noticing now that has been released recently (one year ago, actually) a new version of glew 2.2. The bundled version of glew in meshlab is 2.1. Could this be the cause of the issue?

neildarlow commented 2 years ago

Looking at the flatpak build files it appears that MeshLab is built with a shared-module GLEW at version 2.2.0.

Is GLEW the only dependency stopping MeshLab running correctly under a Wayland session?

christian-rauch commented 2 years ago

I still think that GLEW is not the reason we observed the different behaviour. I was using the same flatpak with all its runtime dependencies, such as GLEW, and MehsLab worked as expected on Wayland with an Intel GPU. This issue might also appear without flatpak at all.

@neildarlow Did you ever try to compile MeshLab locally and did you observe the same issue on Wayland?

But we only have two "samples" for now. Maybe we should publish a flatpak beta with Wayland support enabled and as people to test it on Wayland and report issues together with information about the OS, drivers, etc., to the flathub issue tracker.

alemuntoni commented 2 years ago

I am now running wayland with nvidia drivers on my new laptop, and after building meshlab, it was crashing at start with the message:

Warning: Ignoring XDG_SESSION_TYPE=wayland on Gnome. Use QT_QPA_PLATFORM=wayland to run on Wayland anyway.
qt.qpa.plugin: Could not load the Qt platform plugin "xcb" in "" even though it was found.
This application failed to start because no Qt platform plugin could be initialized. Reinstalling the application may fix this problem.

Available platform plugins are: eglfs, linuxfb, minimal, minimalegl, offscreen, vnc, wayland-egl, wayland, wayland-xcomposite-egl, wayland-xcomposite-glx, xcb.

The missing qt plugin was causing the crash. I then installed the package libxcb-xinerama0 as suggested here and meshlab runs fine from my build now. However, the warning about ignoring the XDG_SESSION_TYPE env variable is still there. I don't know if any of these infos could help on solving the flatpak issue...

christian-rauch commented 2 years ago

However, the warning about ignoring the XDG_SESSION_TYPE env variable is still there.

You have to use QT_QPA_PLATFORM=wayland meshlab to run MeshLab on Wayland.

The warning is there because Qt will still use X11 over Wayland on GNOME when searching for compatible QPA (Qt Platform Abstraction Plugins). This can be avoided, and thus Wayland can be enforced, by explicitly setting QT_QPA_PLATFORM=wayland. The Qt inside the flatpak removed that fallback and always uses Wayland when available.

If in doubt, you can install xeyes (sudo apt install x11-apps) and hover with the mouse of a window. If the eyes move, this is an X11 window. If the eyes don't move, it's Wayland. This also demonstrates the "keylogging" issue with X11 apps.

I don't know if any of these infos could help on solving the flatpak issue...

Well, it would always help if the flatpak and the "native" non-flatpak version can be compared to see if an issue is caused by the flatpak runtime or by something else.

alemuntoni commented 2 years ago

Oh ok, therefore meshlab was running with x11. When setting QT_QPA_PLATFORM=wayland, no window is shown at all, but the process is running...

christian-rauch commented 2 years ago

When setting QT_QPA_PLATFORM=wayland, no window is shown at all, but the process is running...

Can you run inxi -GSC -xx on your system and pass the output here?

alemuntoni commented 2 years ago

here it is:

System:
  Host: XPS-15-9510 Kernel: 5.13.0-19-generic x86_64 bits: 64 compiler: gcc
    v: 11.2.0 Desktop: GNOME 40.5 tk: GTK 3.24.30 wm: gnome-shell dm: GDM3
    Distro: Ubuntu 22.04 (Jammy Jellyfish)
CPU:
  Info: 8-core model: 11th Gen Intel Core i7-11800H bits: 64 type: MT MCP
    arch: Tiger Lake rev: 1 cache: L1: 640 KiB L2: 10 MiB L3: 24 MiB
  Speed (MHz): avg: 1237 high: 3372 min/max: 800/4600 cores: 1: 991 2: 910
    3: 955 4: 918 5: 1029 6: 1042 7: 1021 8: 3372 9: 1371 10: 801 11: 1223
    12: 2026 13: 1088 14: 1028 15: 1007 16: 1024 bogomips: 73728
  Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx
Graphics:
  Device-1: Intel TigerLake-H GT1 [UHD Graphics] vendor: Dell driver: i915
    v: kernel bus-ID: 0000:00:02.0 chip-ID: 8086:9a60
  Device-2: NVIDIA GA107M [GeForce RTX 3050 Ti Mobile] vendor: Dell
    driver: nvidia v: 495.46 bus-ID: 0000:01:00.0 chip-ID: 10de:25a0
  Device-3: Microdia Integrated_Webcam_HD type: USB driver: uvcvideo
    bus-ID: 3-11:3 chip-ID: 0c45:672e
  Display: wayland server: X.Org 1.21.1.3 compositor: gnome-shell driver:
    loaded: modesetting,nvidia unloaded: fbdev,nouveau,vesa
    resolution: 3840x2400~60Hz s-dpi: 96
  OpenGL: renderer: Mesa Intel UHD Graphics (TGL GT1) v: 4.6 Mesa 21.2.2
    direct render: Yes
christian-rauch commented 2 years ago

Your system has two GPUs. Do you know on which MeshLab runs? Maybe this is something related to the new Nvidia driver. Or the desktop somehow gets confused about where to run the OpenGL stuff and where to show the window. I only tested this with an Intel iGPU without a secondary discrete GPU. I actually haven't used Wayland on Nvidia yet.

alemuntoni commented 2 years ago

I think that it was using the Intel graphic card, actually. I noticed now that the default nvidia x server setting was set to "NVIDIA on demand"... I'll try to switch to NVIDIA to see if something changes

alemuntoni commented 2 years ago

So, when setting the x server to NVIDIA, meshlab does not start (process running, but no window shown), with and without the env variable QT_QPA_PLATFORM=wayland set. But actually, even QtCreator does not start:

$ /opt/Qt/Tools/QtCreator/bin/qtcreator
qt.qpa.plugin: Could not find the Qt platform plugin "wayland" in ""
This application failed to start because no Qt platform plugin could be initialized. Reinstalling the application may fix this problem.

Available platform plugins are: eglfs, linuxfb, minimal, minimalegl, offscreen, vkkhrdisplay, vnc, xcb.

Aborted (core dumped)
christian-rauch commented 2 years ago

when setting the x server to NVIDIA

Hm, on Wayland there should be no X server, only a Wayland compositor. It could be that there is an X server and a Wayland compositor running in parallel. In this case, clients would first try to connect to the Wayland compositor. If this is on another screen (Ctrl+Alt+F1...F12) then you will not see it.

You can test this with a simple Wayland client. If you run weston-terminal in a terminal and no window is shown, it means that it is connected to a Wayland compositor on another screen (switch with Ctrl+Alt+F1...F12).

To make sure that your desktop is indeed running on Wayland, you can check the "About" tab in the settings. Under "Windowing System" it should say "Wayland".

alemuntoni commented 2 years ago

Just to be clear, these are the options I have with the nvidia setttings: image But I am not sure what they actually set on a wayland session...

I tested weston-terminal, and it works as expected (it appears on the right screen). and the "About" tab tells correctly that wayland is the windowing system.

It appears that Qt applications just don't work when QT_QPA_PLATFORM=wayland is set. Even QtCreator crashes at start. And I can recognize all the apps that are running x11 over wayland by moving their windows in a second non-hidpi monitor, because they are 2x scaled there.

christian-rauch commented 2 years ago

These settings only affect the X Server and the X11 clients. I don't know how NIVIDIA Optimus / Prime works on Wayland. I noticed in GNOME Shell that when I right-click on an application icon in the dash or in the application overview, that there is an option to "Launch using Dedicated Graphics Card". Maybe this changes the behaviour on Wayland?

If weston-terminal works, then at least the general Wayland session works. I guess all your GTK applications (file manager, terminal, gedit, ...) work as expected too? Maybe this is indeed an issue with Qt on multi-GPU setups only. It will take a while until I can switch to the upcoming Ubuntu 22.04 on my Optimus / Prime system to test this in person.

neildarlow commented 2 years ago

Hi,

Is the QT_QPA_PLATFORM=Wayland environment variable the switch for using the QtWayland plugin? And is the use of the QtWayland plugin transparent to a Qt application?

Excuse my ignorance but I've no experience of Qt application development or structure whatsoever and I'm wondering if we should be asking the question: Is MeshLab a Wayland-ready application? It's OK if it isn't but we just need to identify if that's the case.

christian-rauch commented 2 years ago

Is the QT_QPA_PLATFORM=Wayland environment variable the switch for using the QtWayland plugin?

Yes, this will force Qt to use the wayland QPA plugin. These plugins are transparent to the application. I think there are private APIs to interact with the underlying windowing system. But usually, applications should interact with all QPA plugins in the same way. Setting this environment variable might not work with the flatpak because it runs in a sandbox, but Qt in the KDE flatpak SDK already has Wayland enabled by default.

Is MeshLab a Wayland-ready application?

According to my experience: YES. According to your experience: NO. This comes down to the question if there are Wayland issues with MeshLab :-) Qt itself is "Wayland ready". I created a CloudCompare flatpak (org.cloudcompare.CloudCompare) which also uses Qt and runs without issues (for me) on Wayland.

The point of this very issue is to figure out what problems with Wayland are left to be solved and if those are caused by MeshLab itself or by the flatpak.

christian-rauch commented 2 years ago

I was able to reproduce this on the same hardware setup with a newer GNOME Shell (https://gitlab.gnome.org/GNOME/mutter/-/issues/2095). I was also able to reproduce this with a minimal MeshLab build without flatpak.

Build minimal MeshLab:

cmake -B build src -GNinja -DBUILD_MESHLAB_MINI=ON
cmake --build build

Run with wayland:

QT_QPA_PLATFORM=wayland ./build/distrib/meshlab

So this is not a flatpak specific issue and somehow caused by newer GNOME Shell versions.

neildarlow commented 2 years ago

So, is this a case of a newer GNOME shell breaking things or the flatpak build missing some element that's also missing in your minimal build?

Maybe following the path of what is needed to make the minimal build run correctly might give some clue as to whether the flatpak does have some omission?

christian-rauch commented 2 years ago

So, is this a case of a newer GNOME shell breaking things or the flatpak build missing some element that's also missing in your minimal build?

No. As I said, this is not a flatpak-specific issue. This already happens with a minimal MeshLab build (BUILD_MESHLAB_MINI=ON). I've chosen the minimal configuration to build faster in case someone wants to reproduce this. You can build the full MeshLab, but the effect should be the same.

This is an issue how MeshLab, via Qt, interacts with the desktop shell. It might be that GNOME Shell broke something in a recent update. But it could also be that GNOME Shell corrected some protocol behaviour, which Qt falsely relied on.

Maybe following the path of what is needed to make the minimal build run correctly might give some clue as to whether the flatpak does have some omission?

Correct. This is why I was asking all the questions to figure out if this is a MeshLab or flatpak issue.

christian-rauch commented 2 years ago

This seems to be a client issue (Qt or MeshLab) after all. I can reproduce the described behaviour or similar graphical issues on other desktops/shells/compositors (KDE and sway). Even if this gets a workaround in GNOME, it will cause graphical issues on other desktops.

I still haven't figured out why this happens with MeshLab, but not with other Qt software that renders 3D, like CloudCompare. @alemuntoni Do you have any idea which part of Qt or which widget might cause this? Having a minimal Qt example that reproduces this would be helpful for the Qt devs. This issue is pretty easy reproducible in a virtual machine with the most recent GNOME or KDE. There is no need to run this on real hardware.

alemuntoni commented 2 years ago

I have a clue on what could be the issue, but I am not sure. Meshlab uses massively the QGLWidget, which is obsolete from a while. We were planning to replace it with the new QOpenGLWidget (also because it is mandatory if we want to switch to qt6 in the future), but... We just procrastinated because the code that uses that class is a mess. I am not sure if that could be the cause (or part of it), but if there is the chance that it is, it would be a nice motivation to finally begin the transition to QOpenGLWidget...

christian-rauch commented 2 years ago

Meshlab uses massively the QGLWidget, which is obsolete from a while. We were planning to replace it with the new QOpenGLWidget

That's a good point. CloudCompare uses the QOpenGLWidget and replaced QGLWidget a long time ago. I think that there is a high chance of working around this by switching from the deprecated QGLWidget to the new QOpenGLWidget. If there are any bugs in QGLWidget, I doubt they will ever be fixed.

christian-rauch commented 2 years ago

The GNOME Shell devs pointed me to the Qt bug: https://bugreports.qt.io/browse/QTBUG-86229. @alemuntoni There seems to be a workaround using these DontCreateNative flags. But I am not sure if this is really the same issue.

alemuntoni commented 2 years ago

Thanks, @christian-rauch! Next days I'll start to replace all the old QGL classes (it is something that we need to do anyway).

I still cannot test properly meshlab running wayland natively, therefore I have no way to check if the workaround you suggested works... Anyway, I still have to work on some other things and then I'll focus on this issue, to hopefully solve it soon!

neildarlow commented 2 years ago

@alemuntoni As the original reporter, I will be happy to test anything you produce. At the moment I can replicate the problem simply by disabling X11 and enabling Wayland in the standard flatpak using flatseal.

christian-rauch commented 2 years ago

I still cannot test properly meshlab running wayland natively, therefore I have no way to check if the workaround you suggested works

Alternatively to running the program in the current desktop session, you can run a nested wayland session via mutter --wayland --nested. This will create a new socket wayland-1 that clients can connect to by setting WAYLAND_DISPLAY=wayland-1 (e.g. WAYLAND_DISPLAY=wayland-1 meshlab). The wayland-1 is dynamically created here. If you start the nested Wayland session from an X11 session, this will be wayland-0. You can list all standard wayland sockets via ls -al $XDG_RUNTIME_DIR/wayland-*.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. The resources of the VCLab team are limited, and so we are asking for your help. If this is a bug and you can still reproduce this error on the last release of MeshLab, please reply with all of the information you have about it in order to keep the issue open. If this is a feature request, and you feel that it is still relevant and valuable, please tell us why. This issue will automatically be closed in the near future if no further activity occurs. Thank you for all your contributions.

alemuntoni commented 2 years ago

Right now, we are updating the codebase of meshlab in order to switch to Qt6. After switching to Qt6 and removing the usage of old deprecated classes, we will make meshlab work properly on wayland.

rurban commented 2 months ago

With fedora core 40, meshlab-2023.12-3.fc40.rpm

$ meshlab QSocketNotifier: Can only be used with threads started with QThread Using OpenGL 4.6 qt.qpa.wayland: Wayland does not support QWindow::requestActivate()

with the same symptoms as described above (transparent main window)

Note that fc41 will remove the OpenGL X11 fallback, so will not be able to use the X11 fallback WAYLAND_DISPLAY=not-exist meshlab with Failed to create wl_display (No such file or directory) qt.qpa.plugin: Could not load the Qt platform plugin "wayland" in "" even though it was found. X11 fallback on wayland is unsuable anyway, as the mouse/trackpad opengl movements in the main window are not working.