Open twhitehead opened 1 day ago
I see gcc has a musttail
attribute that can be applied to a function call return
[[gnu::musttail]] return foo();
Could it be as simple as that (well, a bit more complex actually, as the final call is actually two levels deep with a comment that there must have been a good reason for this...)?
This has been brought up before (see #107, #250), and relevant application recipes have been added to work around it. (I could add one for ParaView as well.) It would be nice to work around it more generally. I'll experiment with the musttail
attribute.
Having debugged this with @twhitehead I was wondering about the reason for using libdlfaker
by default, and as far as I can see it's meant for programs that dlopen
libGL.so
instead of linking to it directly. From what I understand this used to be really common, but nowadays maybe a little less? Paraview links to it directly, but I think e.g. MATLAB doesn't.
It's perhaps even more common now than it used to be. There are entire frameworks that use dlopen()
/dlsym()
to load OpenGL functions, so any application built with one of those frameworks would fail if the dlopen()
interposer weren't preloaded by default. The goal of VirtualGL is to "just work" for the maximum possible number of applications, i.e. to minimize the number of application recipes required. Thus, it would be really nice to work around the RPATH
issue, because that would allow me to remove even more recipes. However, as of this moment, that issue is known to affect only three applications. Furthermore, I'm not sure which version of ParaView started experiencing the issue, but I know that older versions worked fine with VirtualGL.
For completeness the sequence here is as follows (why Paraview crashes here):
paraview
links to libospray.so.2
via libvtkRenderingRayTracing-pv5.11.so
. libvtkRenderingRayTracing-pv5.11.so
has an RPATH to the dir with libospray.so.2
.
libospray.so.2
successfully dlopen
s libospray_module_cpu.so.2
in the same directory
libospray_module_cpu.so.2
directly links to libispcrt.so.1
; the RPATH tolibispcrt.so.1
is in libospray_module_cpu.so.2
, but not in the upper levels.
libispcrt.so.1
tries to dlopen
libispcrt_device_cpu.so.1
(in the same directory as libispcrt.so.1
), which fails with virtualgl, even if libispcrt.so.1
and libospray_module_cpu.so.2
have that in their RPATH. Instead (with libdlfaker
) it searches using the top-level RPATH from the paraview executable.
Hope that helps for reference.
When libdlfaker.so hooks
dlopen
it causes the ultimatedlopen
call to come from libdlfaker.so instead of the original library. This results in anyRPATH
s in that library not being used to search for the library beingdlopen
ed.I discovered this on the Digital Research Alliance of Canada's clusters when when using the newest ParaView. Here is a screenshot of showing how ParaView fails to load due a dependent library not finding a library it requires
Here is a screenshot showing how the libraries do properly resolve when your remove libdlfaker.so from sitting between the library and
glibc
I realize this is probably a difficult items to resolve, but I opened this ticket to at least document the issue.
That said, I think it is actually technically possible as the final libdlfaker.so call to the actual
dlopen
is always (except when tracing in enabled) in a tail call position. That is, it looks likewhich means your could technically tail call the real
dlopen_real
(i.e., drop the libdlfaker.so's stack frame and jump to thedlopen_real
instead of calling it) so the the top-level return address would be in the original library's address space and glibc would then presumably apply itsRPATH
.