Closed XmainframeX closed 3 years ago
This part of the stack trace looks wrong:
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff73e2503 in ?? () from /usr/lib64/libGLX_nvidia.so.0
(gdb) bt
#0 0x00007ffff73e2503 in ?? () from /usr/lib64/libGLX_nvidia.so.0
#1 0x00007ffff73bb480 in ?? () from /usr/lib64/libGLX_nvidia.so.0
#2 0x00007ffff73bd9b5 in glXCreateContextAttribsARB () from /usr/lib64/libGLX_nvidia.so.0
#3 0x00007ffff76d945d in glXCreateContextAttribsARB () from /usr/lib64/primus/libGL.so.1
#4 0x00007ffff77e4625 in BBGLXContext::BBGLXContext (this=0x7fffffffb380, display=0x7ffff77e5055 ":8")
at nv_vulkan_wrapper.cpp:61
#5 0x00007ffff77e4792 in StaticInitialize::StaticInitialize (this=0x7ffff77e7160 <init>)
at nv_vulkan_wrapper.cpp:102
When I run this, the calls are not routed through primus but go direct to libGLX_nvidia
, which seems more correct to me:
$ pvkrun gdb --args vkcube
....
(gdb) b glXCreateContextAttribsARB
Function "glXCreateContextAttribsARB" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (glXCreateContextAttribsARB) pending.
(gdb) r
Starting program: /usr/bin/vkcube
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Breakpoint 1, 0x00007ffff6f466e0 in glXCreateContextAttribsARB () from /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.0
(gdb) bt
#0 0x00007ffff6f466e0 in glXCreateContextAttribsARB () from /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.0
#1 0x00007ffff758bd64 in ?? () from /usr/lib/x86_64-linux-gnu/libGLX.so.0
#2 0x00007ffff76ff46b in BBGLXContext::BBGLXContext(char const*) () from /usr/lib/x86_64-linux-gnu/libnv_vulkan_wrapper.so.1
#3 0x00007ffff76ff51d in StaticInitialize::StaticInitialize() () from /usr/lib/x86_64-linux-gnu/libnv_vulkan_wrapper.so.1
Could you please provide the output of ldd libnv_vulkan_wrapper.so
on your system? On my system this prints:
$ ldd libnv_vulkan_wrapper.so
linux-vdso.so.1 (0x00007ffcdbd19000)
libX11.so.6 => /usr/lib/x86_64-linux-gnu/libX11.so.6 (0x00007f973be45000)
libGLX.so.0 => /usr/lib/x86_64-linux-gnu/libGLX.so.0 (0x00007f973be11000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f973be0b000)
libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f973bc3e000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f973bc24000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f973ba5f000)
libxcb.so.1 => /usr/lib/x86_64-linux-gnu/libxcb.so.1 (0x00007f973ba33000)
libGLdispatch.so.0 => /usr/lib/x86_64-linux-gnu/libGLdispatch.so.0 (0x00007f973b97b000)
/lib64/ld-linux-x86-64.so.2 (0x00007f973bfba000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f973b837000)
libXau.so.6 => /usr/lib/x86_64-linux-gnu/libXau.so.6 (0x00007f973b633000)
libXdmcp.so.6 => /usr/lib/x86_64-linux-gnu/libXdmcp.so.6 (0x00007f973b42d000)
libbsd.so.0 => /usr/lib/x86_64-linux-gnu/libbsd.so.0 (0x00007f973b411000)
Tomorrow I will have a closer look on which symbol resolves where on my system and hopefully have more debug hints for you.
Thanks a lot for the reply! Here is the output you requested:
$ ldd libnv_vulkan_wrapper.so.1
linux-vdso.so.1 (0x00007ffd3c578000)
libX11.so.6 => /usr/lib64/libX11.so.6 (0x00007fb4fa677000)
libGLX.so.0 => /usr/lib64/libGLX.so.0 (0x00007fb4fa644000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007fb4fa63f000)
libstdc++.so.6 => /usr/lib/gcc/x86_64-pc-linux-gnu/9.3.0/libstdc++.so.6 (0x00007fb4fa3c6000)
libm.so.6 => /lib64/libm.so.6 (0x00007fb4fa290000)
libgcc_s.so.1 => /usr/lib/gcc/x86_64-pc-linux-gnu/9.3.0/libgcc_s.so.1 (0x00007fb4fa276000)
libc.so.6 => /lib64/libc.so.6 (0x00007fb4fa0b9000)
libxcb.so.1 => /usr/lib64/libxcb.so.1 (0x00007fb4fa08e000)
libGLdispatch.so.0 => /usr/lib64/libGLdispatch.so.0 (0x00007fb4f9fd6000)
/lib64/ld-linux-x86-64.so.2 (0x00007fb4fa805000)
libXau.so.6 => /usr/lib64/libXau.so.6 (0x00007fb4f9fd1000)
libXdmcp.so.6 => /usr/lib64/libXdmcp.so.6 (0x00007fb4f9fc9000)
libbsd.so.0 => /usr/lib64/libbsd.so.0 (0x00007fb4f9fad000)
I investigated a bit further why the /usr/lib64/primus/libGL.so.1 library is loaded. It seems like primusrun prefixes the LD_LIBRARY_PATH with /usr/lib64/primus, so it gets set when pvkrun uses primusrun. I tried to uncomment this LD_LIBRARY_PATH change in primusrun temporarily, but this results in the following output then:
$ LC_ALL=C pvkrun vkcube
Can't open bumblebee display.
INTEL-MESA: warning: Performance support disabled, consider sysctl dev.i915.perf_stream_paranoid=0
PrimusVK: Searching for display GPU:
PrimusVK: 0x561ae5886800: 32902;22811
PrimusVK: Got integrated gpu!
PrimusVK: Device: Intel(R) HD Graphics 630 (KBL GT2)
PrimusVK: Type: 1
PrimusVK: Searching for render GPU:
PrimusVK: 0x561ae5886800.
PrimusVK: No device for the rendering GPU found. Is the correct driver installed?
PrimusVK: VK_ICD_FILENAMES not set
vkcube: /var/tmp/portage/dev-util/vulkan-tools-1.2.162/work/Vulkan-Tools-1.2.162/cube/cube.c:3271: demo_init_vk: Assertion `!err' failed.
Abgebrochen (Speicherabzug geschrieben)
Is this of any help?
The primus libGL.so.1
must be in the path for primus/primus-vk to work, as primus-vk also uses it to boot up power of the dedicated graphics card (and for OpenGL it is needed to offload render commands), so that works as intended. I also managed to pull up a back trace that looks closer to what you are seeing:
#0 0x00007fffeceba6e0 in glXCreateContextAttribsARB () from /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.0
#1 0x00007ffff72a4d64 in ?? () from /usr/lib/x86_64-linux-gnu/libGLX.so.0
#2 0x00007ffff74719ed in glXCreateContextAttribsARB () from /usr/lib/x86_64-linux-gnu/primus/libGL.so.1
#3 0x00007ffff7fc646b in BBGLXContext::BBGLXContext(char const*) () from /usr/lib/x86_64-linux-gnu/libnv_vulkan_wrapper.so.1
#4 0x00007ffff7fc651d in StaticInitialize::StaticInitialize() () from /usr/lib/x86_64-linux-gnu/libnv_vulkan_wrapper.so.1
It seems, that you are skipping libGLX here. This bears the question: does primus work on your system? (e.g. primusrun glxgears
)?
Additionally it seems interesting, which libraries primus is configured with. What does optirun -b primus env
output (especially PRIMUS_libGLd
and PRIMUS_libGLa
. And do you have glvnd
? So what package provides the system-wide libGL.so.1
? Mesa or nvidia?
I'll try to answer your questions:
Primus works on my system: primusrun glxgears
shows up and primusrun glxinfo
shows the NVIDIA GPU.
I have not configured any libraries for primus manually. optirun -b primus env
shows
PRIMUS_libGLa=/usr/lib64/opengl/nvidia/lib/libGL.so.1:/usr/lib/opengl/nvidia/lib/libGL.so.1
PRIMUS_libGLd=/usr/$LIB/libGL.so.1:/usr/lib/$LIB/libGL.so.1:/usr/$LIB/mesa/libGL.so.1:/usr/lib/$LIB/mesa/libGL.so.1
The files listed behind PRIMUS_libGLa don't exist on my system. However, if I do
export PRIMUS_libGLa='/usr/lib64/libGLX_nvidia.so'
export PRIMUS_libGLd='/usr/lib64/libGL.so'
or
export PRIMUS_libGLa='/usr/lib64/libGLX_nvidia.so'
export PRIMUS_libGLd='/usr/lib64/libGLX.so'
, neither the behaviour of primusrun glxinfo
(still shows the NVIDIA GPU) nor of pvkrun vulkaninfo
(still segfaults) changes.
I have glvnd. In fact, my system-wide libGL.so.1 is provided by a package named media-libs/libglvnd, so I guess this is not from NVIDIA. In addition, i have unset the nvidia-driver's compat flag ("Install non-GLVND libGL for backwards compatibility"), so I think everything should be set up for glvnd on my system.
Don't put the libGLXes in there. Use the libGL. And if glvnd is working on your system, you should use the system libGL.so.1. So please try:
export PRIMUS_libGLa='/usr/lib64/libGL.so.1'
export PRIMUS_libGLd='/usr/lib64/libGL.so.1'
Assuming that is your glvnd-provided libGL.
Thanks for the hint. I have exported PRIMUS_libGLa and PRIMUS_libGLd as you described - this libGL.so.1 is the one that my glvnd provides. However, even though optirun -b primus env
now shows
PRIMUS_libGLa=/usr/lib64/libGL.so.1
PRIMUS_libGLd=/usr/lib64/libGL.so.1
, the behaviour of pvkrun vkcube
is still the same and the stack trace still shows this when the segfault happenes:
#0 0x00007ffff732a503 in ?? () from /usr/lib64/libGLX_nvidia.so.0
#1 0x00007ffff7303480 in ?? () from /usr/lib64/libGLX_nvidia.so.0
#2 0x00007ffff73059b5 in glXCreateContextAttribsARB () from /usr/lib64/libGLX_nvidia.so.0
#3 0x00007ffff762145d in glXCreateContextAttribsARB () from /usr/lib64/primus/libGL.so.1
#4 0x00007ffff7b02625 in BBGLXContext::BBGLXContext (this=0x7fffffffbac0, display=0x7ffff7b03055 ":8")
at nv_vulkan_wrapper.cpp:61
#5 0x00007ffff7b02792 in StaticInitialize::StaticInitialize (this=0x7ffff7b05160 <init>)
at nv_vulkan_wrapper.cpp:102
You mentioned before ( https://github.com/felixdoerre/primus_vk/issues/88#issuecomment-774737455 ), that you were able to pull up a stack trace which looks more like mine and also contains the /usr/lib64/primus/libGL.so.1. Did this run succeed on your device or also segfault like mine? I'm asking because I'm wondering if we should concentrate of getting the primus/libGL.so.1 out of the backtrace or if the problem might have other origins.
The run succeeded. Just for reference: To get this stack trace, I forced loading of OpenGL (and therby primus) before loading vulkan by running ./primus_vk_diag gl vulkan gl
. That way I got a more "similar" bt, where libnv_vulkan_wrapper.so.1
loads primus libGL. So I guess the target should be more to get libGLX
into the back trace than to get primus out.
Also what is strange:
The files listed behind PRIMUS_libGLa don't exist on my system. However, if I do....
But that should be a fatal error. When I set PRIMUS_libGLa
so something non-existent on my system, this is the output:
PRIMUS_libGLa=/usr/lib64/opengl/nvidia/lib/libGL.so.1:/usr/lib/opengl/nvidia/lib/libGL.so.1 pvkrun glxgears
primus: fatal: failed to load any of the libraries: /usr/lib64/opengl/nvidia/lib/libGL.so.1:/usr/lib/opengl/nvidia/lib/libGL.so.1
/usr/lib64/opengl/nvidia/lib/libGL.so.1: cannot open shared object file: No such file or directory
/usr/lib/opengl/nvidia/lib/libGL.so.1: cannot open shared object file: No such file or directory
You noted, that through exporting you have overridden the PRIMUS-configuration environment variables. Can you try ENABLE_PRIMUS_LAYER=1 optirun -b primus env PRIMUS_libGLa=/usr/lib64/libGL.so.1 PRIMUS_libGLd=/usr/lib64/libGL.so.1 vkcube
now?
I guess I found the issue. But lets go through your last comment step by step.
I actually have no idea why optirun works when PRIMUS_libGLa
is set to something non-existing for me.
However, it turned out that your proposed ENABLE_PRIMUS_LAYER=1 optirun -b primus env PRIMUS_libGLa=/usr/lib64/libGL.so.1 PRIMUS_libGLd=/usr/lib64/libGL.so.1 vkcube
call worked! Being curious why optirun with manually set primus environment variables works but primusrun not, I had a look at my /usr/bin/primusrun
again and saw that it contains the following lines:
export PRIMUS_libGLa='/usr/$LIB/libGLX_nvidia.so.0'
export PRIMUS_libGLd='/usr/$LIB/libGLX.so.0'
It seems to override any previously exported primus environment variables to those two which are not the correct ones for a GLVND-Setup as you pointed out.
Modifying /usr/bin/primusrun
to contain
export PRIMUS_libGLa='/usr/$LIB/libGL.so.1'
export PRIMUS_libGLd='/usr/$LIB/libGL.so.1'
fixes the issue. With this modification, pvkrun vkcube
or pvkrun vulkaninfo
run just fine.
I tried to track down the source of this issue and found that the original primusrun at https://github.com/amonakov/primus/blob/master/primusrun does not contain any active PRIMUS_libGLa
/ PRIMUS_libGLd
exports, so it has to be configured by the user. However, the gentoo ebuild for primus https://gitweb.gentoo.org/repo/gentoo.git/tree/x11-misc/primus/primus-0.2-r2.ebuild seems to introduce those glvnd incompatible exports. Do you agree that it shoud be reported as bug of this ebuild? Then I'd report it there and close the issue here.
Yes, I'd agree that this should be reported as a bug in the ebuild and fixed there. I'll give a bit more context that might help convince others why libGL.so.1
is correct there and both libGLX
and libGLX_nvidia
are bad choices: With GLVND vendor-neutral libGL.so.1
and libGLX.so.1
are introduced that are intended to be the interface used by OpenGL applications (see also this diagram: https://github.com/NVIDIA/libglvnd#architecture). These libGL.so.1
and libGLX.so.1
have the job of figuring out the correct libGLX_vendor.so.1
, loading it, and passing the function pointers from there, dependent on which GLX provider is used for the corresponding display. So in a way libGL.so.1
/libGLX.so.1
are taking the place for OpenGL that the vulkan-loader has for Vulkan.
The shortcut of directly referencing libGLX_vendor
instead of libGL.so.1
(or evenlibGLX
) is in fact a "lucky guess", as the libGLX_vendor
library is not even required to export any OpenGL-Symbols at all but only __glx_Main
(see e.g. here: https://github.com/NVIDIA/libglvnd/blob/acc654454867c7cdd681cc1f60f858bcd6e5e729/include/glvnd/libglxabi.h). Those symbols initially were only there by accident and will now likely stay there for compatibility with all the stuff that's already building upon, but it would be better to use the "correct" way.
For reference: this problem should not only be fixed in primusrun but also in optirun/bumblebee directly. (That way primusrun
and optirun -b primus
both work as intended). When the environment variables are not set, primus asks for these libraries from the bumblebee daemon. With the correct settings, the bumblebee daemon will report the correct libraries. This is the corresponding line from /etc/bumblebee/bumblebee.conf
on debian.
[driver-nvidia]
.....
LibraryPath=/usr/lib/x86_64-linux-gnu/nvidia:/usr/lib/i386-linux-gnu/nvidia:/usr/lib/x86_64-linux-gnu:/usr/lib/i386-linux-gnu
The last segments of the search path, causes primus to find /usr/lib/x86_64-linux-gnu/libGL.so.1
(see this logic here: https://github.com/amonakov/primus/blob/d1afbf6fce2778c0751eddf19db9882e04f18bfd/libglfork.cpp#L199)
This is also (roughly) the way that optirun
fetches the libraries (https://github.com/Bumblebee-Project/Bumblebee/blob/aaa1b42724f917c523515efe35e7af03bd755160/src/optirun.c#L226), so now both "launchers" get their libraries from the same source.
So summing it up: my suggested fix would be to remove all overriding from primusrun
(so having the default values compiled-in: https://github.com/amonakov/primus/blob/d1afbf6fce2778c0751eddf19db9882e04f18bfd/Makefile#L18) and adjust the bumblebee.conf accordingly. That way optirun
and primusrun
should both work.
I hope this help sorting all this mess out :D.
Thanks a lot for the guidance for debugging and all the in-depth information!
Since this problem seems to be introduced by the gentoo ebuild for primus, I have filed a bug there https://bugs.gentoo.org/770193 and will close the issue here.
Hi, I have installed primus_vk from source on my Gentoo laptop with an integrated Intel GPU and a dedicated NVIDIA GTX 1050 GPU. I have applied some patches primus_vk_gentoo_patch.txt to primus_vk to
I have removed NVIDIAs icd file such that the remaining configuration looks like this:
Now when I run
pvkrun vulkaninfo
orpvkrun vkcube
, both end up with a segmentation fault without further explanation.When I run the whole thing in a debugger
pvkrun gdb --args vulkaninfo
and print a stack trace after the segfault, it looks like this:It seems like vk calls are nicely forwarded from libvulkan.so to libnv_vulkan_wrapper.so.1 to libGLX_nvidia.so.0 which segfaults somewhere in glXCreateContextAttribsARB().
Considering that I'm using Gentoo, I already checked some suggestions I found online:
Maybe I should also mention that I'm running gnome with wayland.
I'd really be glad about any hint on how to solve / debug this issue!