Closed jonnyrobbie closed 6 years ago
Thanks for your report. Unfortunately, I don’t have an nvidia GPU using the binary blob driver, nor I use kwin, so I’ll need your help to investigate it.
It would be great if you could bisect epoxy between the 1.4.3 and 1.5.0 tags, to identify the regression. Reading the issues you linked seems to point to glvnd, but it would be good to be sure.
Additionally, it would be good to understand what the regression actually is, i.e. why is the nvidia driver breaking when testing for the glvnd interface first.
I've added a testing request to the bug:
@nwnk should we add epoxyinfo to epoxy proper, as a debugging tool?
Ok, interesting, I tried doing git bisect. The catch is that 1.5.0 did not even build. Part of Arch's PKGBUILD is a check() routine meson check
1.5.0 failed
ninja: Entering directory `/var/cache/AUR/build/libepoxy/src/build'
ninja: no work to do.
1/18 header_guards OK 0.01 s
2/18 misc_defines OK 0.01 s
3/18 khronos_typedefs OK 0.01 s
4/18 egl_has_extension_nocontext OK 0.02 s
5/18 egl_gl SKIP 0.04 s
6/18 egl_gles1_without_glx SKIP 0.02 s
7/18 egl_gles2_without_glx SKIP 0.01 s
8/18 glx_beginend OK 0.20 s
9/18 glx_public_api FAIL 0.22 s
10/18 glx_public_api_core FAIL 0.25 s
11/18 glx_glxgetprocaddress_nocontext OK 0.10 s
12/18 glx_has_extension_nocontext OK 0.09 s
13/18 glx_shared_znow FAIL 0.25 s
14/18 glx_alias_prefer_same_name SKIP 0.27 s
15/18 glx_gles2 FAIL 0.35 s
16/18 egl_and_glx_different_pointers_glx FAIL 0.66 s
17/18 egl_and_glx_different_pointers_egl SKIP 0.04 s
18/18 egl_and_glx_different_pointers_egl_glx OK 0.37 s
OK: 8
FAIL: 5
SKIP: 5
TIMEOUT: 0
1.4.3 passed
ninja: Entering directory `/var/cache/AUR/build/libepoxy/src/build'
ninja: no work to do.
1/18 header_guards OK 0.01 s
2/18 misc_defines OK 0.01 s
3/18 khronos_typedefs OK 0.01 s
4/18 egl_has_extension_nocontext OK 0.01 s
5/18 egl_gl SKIP 0.04 s
6/18 egl_gles1_without_glx SKIP 0.01 s
7/18 egl_gles2_without_glx SKIP 0.02 s
8/18 glx_beginend OK 0.16 s
9/18 glx_public_api OK 0.18 s
10/18 glx_public_api_core OK 0.26 s
11/18 glx_glxgetprocaddress_nocontext OK 0.13 s
12/18 glx_has_extension_nocontext OK 0.07 s
13/18 glx_shared_znow OK 0.24 s
14/18 glx_alias_prefer_same_name OK 0.25 s
15/18 glx_gles2 OK 0.25 s
16/18 egl_and_glx_different_pointers_glx OK 0.26 s
17/18 egl_and_glx_different_pointers_egl SKIP 0.04 s
18/18 egl_and_glx_different_pointers_egl_glx SKIP 0.19 s
OK: 13
FAIL: 0
SKIP: 5
TIMEOUT: 0
git bisect says that the first offending commit is:
e5372a25baa9034b6223b32a0cab838c42779a39 is the first bad commit
commit e5372a25baa9034b6223b32a0cab838c42779a39
Author: Adam Jackson <ajax@redhat.com>
Date: Thu Sep 7 17:02:22 2017 -0400
dispatch: Fix the libOpenGL soname
Brown-paper-bag-for: Adam Jackson <ajax@redhat.com>
:040000 040000 02efd85a11cf2cd2abac215723f6adc5fe67de40 223766d5368ca2368007f031570b8c4dfeb90f2a M src
Full meson-logs/testlog.txt of failed check:
Not sure if this helps any, but here it goes...
Because of this (Arch BBS) report, I got to check the issues on libepoxy.
Because I couldn't downgrade (had only the last package version on my pc), I tried re-installing libepoxy from AUR, because it listed as version libepoxy-git 1.4.0.r0.g9628670-1
.
It removed libepoxy (libepoxy 1.5.0-1) from the extra repository and installed libepoxy-git:
[2018-03-08 09:51] [ALPM] transaction started [2018-03-08 09:51] [ALPM] installed xorg-util-macros (1.19.1-1) [2018-03-08 09:51] [ALPM] installed ninja (1.8.2-1) [2018-03-08 09:51] [ALPM] installed meson (0.45.0-1) [2018-03-08 09:51] [ALPM] transaction completed [2018-03-08 09:51] [ALPM] running 'systemd-update.hook'... [2018-03-08 09:51] [ALPM] transaction started [2018-03-08 09:51] [ALPM] removed libepoxy (1.5.0-1) [2018-03-08 09:51] [ALPM] installed libepoxy-git (1.5.0.r1.gc28759f-1) [2018-03-08 09:51] [ALPM] transaction completed [2018-03-08 09:51] [ALPM] running 'systemd-update.hook'...
This seems to fix the issue at hand. Question is: What are the differences between (Arch and AUR) versions 1.5.0-1 and 1.5.0.r1 ?
Hope this can help you on your way. I'd like to install the officially supported repository version ASAP again ;)
@hmhofman that’s a question for Arch packagers.
@jonnyrobbie could you try to do what @nwnk suggested in the kwin bug?
@ebassi @nwnk done. epoxyinfo on libepoxy150 segfaults. The remaining three results are posted there as attachments.
@nwnk should we add epoxyinfo to epoxy proper, as a debugging tool?
Sure. I'll put a branch together at some point if nobody beats me to it.
@jonnyrobbie I have bisected the kwin issue and got the same breaking commit (e5372a25baa9034b6223b32a0cab838c42779a39). Reverting it fixes the issue.
The nvidia driver provides the libOpenGL.so.0 file. I think that before it failed to open it and continued to libGL.so.1. After the change it succeeds but is somehow broken.
@ebassi what is the purpose of the offending commit? I have a feeling that simply reverting it is not the best option.
The purpose is:
hyoscyamine:~% rpm -ql libglvnd-opengl
/usr/lib/.build-id
/usr/lib/.build-id/2a
/usr/lib/.build-id/2a/f15f1061e796bcd682fb2ddd7acc32cbdb1d68
/usr/lib64/libOpenGL.so.0
/usr/lib64/libOpenGL.so.0.0.0
Under glvnd we can avoid loading libGL.so.1, which we might very much like to do, because it pulls in libX11 and friends; instead we would load libOpenGL.so (and only libGLX.so if we can tell it's a GLX not EGL context) and that's what the patch attempts to do. So in that sense I think the patch is correct. and that something else is going wrong elsewhere. A backtrace from epoxyinfo from the broken configuration would still be useful.
I apologize for having the issue all over two threads at the same time. I hope It's not that inconvenient. Anyway, here's the trace from segfaulted epoxyinfo with 1.5.0 libepoxy. Created by mostly following arch guide
That shows us getting a null GL extension string and feeding it to strstr. Arguably we shouldn't crash like that, but also a GL with a null extension string is not a thing (assuming you're not GL 1.0, and I promise you aren't), so what's really happening is the call to glGetString() is fizzling out and the "NULL" it returns is itself the problem.
I'm not entirely sure why that would happen, offhand. I'll try to come up with either another test or some trace code for epoxy itself.
Question is: What are the differences between (Arch and AUR) versions 1.5.0-1 and 1.5.0.r1 ?
aur libepoxy-git doesn't run any tests. There are also small differences in the meson setup between both packages, link time optimization is the biggest one ( used in libepoxy 1.5.0 )
I'm having a similar issue on an Intel gpu using the open source drivers. Could this alternatively be caused by a bug in libglvnd 1.0.0-1?
If someone experiencing this problem can test this patch, it would be much appreciated:
https://github.com/nwnk/libepoxy/commit/a8c3faaa1990d98047e3c566409200604105fa9c
@nwnk :+1: Cloned the current master branch (https://github.com/anholt/libepoxy.git), applied your patch by hand (just to make sure that patch is the only thing updated) Ran the install commands. It does not work on the fly. Rebooted the system. Now it seems to work.
Small side-node: nvidia-340xx drivers have also been updated on my system and this was the 1st boot since. So this might not prove to be the full (only) solution, but it might be. At least it DOES work on my system. KDE KWin compositor now runs on both OpenGL 3.1 and OpenGL 2.0
Here's my system:
Arch Linux
KDE Plasma: 5.12.4
KDE Frameworks: 5.44.0
Qt: 5.10.1
Kernel: 4.15.15-1-ARCH
Type OS: 6-bit
4x Intel Core i5-4430 CPU @3.00GHz
15.5 GiB RAM
2560 x 1024 pixels (765 x 302 mm)
85 x 86 dpi
Depth: 24, 1, 4, 8, 15, 16, 32
OpenGL (GLX & EGL)
NVIDIA Corporation
NVidia GT218 (GeForce 210)
GeForce 210/PCIe/SSE2
3.3.0 NVIDIA 340.106
@nwnk Spoke too soon. While kwin/compositor does not crash anymore, some functions do not work. These include (but are not limited to)
Could it be that the compositor defaults back to XRender even though it says it is using OpenGL ? Before applying this patch, the compositor would crash on OpenGL. So for the last couple of weeks I was running XRender.
Can you try this branch?
https://github.com/nwnk/libepoxy/tree/even-more-gentle-glx-detection
The branch in question was merged.
No comment in 6 months ⇒ closing.
Updating libepoxy from 1.4.3 to 1.5.0 breaks KWin compositing. OGL 2 nor OGL 3.1 does work. More info can be found here: https://bugs.kde.org/show_bug.cgi?id=391486 and https://bbs.archlinux.org/viewtopic.php?id=235021.
Is this a regression or intentional change? It has been suggested to report upstream, which is here.