VirtualGL / virtualgl

Main VirtualGL repository
https://VirtualGL.org
Other
691 stars 104 forks source link

libdlfaker.so interferes with library specific RPATHs #266

Open twhitehead opened 1 day ago

twhitehead commented 1 day ago

When libdlfaker.so hooks dlopen it causes the ultimate dlopen call to come from libdlfaker.so instead of the original library. This results in any RPATHs in that library not being used to search for the library being dlopened.

I discovered this on the Digital Research Alliance of Canada's clusters when when using the newest ParaView. Here is a screenshot of showing how ParaView fails to load due a dependent library not finding a library it requires dlfaker-error

Here is a screenshot showing how the libraries do properly resolve when your remove libdlfaker.so from sitting between the library and glibc dlfaker-workaround

I realize this is probably a difficult items to resolve, but I opened this ticket to at least document the issue.

That said, I think it is actually technically possible as the final libdlfaker.so call to the actual dlopen is always (except when tracing in enabled) in a tail call position. That is, it looks like

  return dlopen_real(...);

which means your could technically tail call the real dlopen_real (i.e., drop the libdlfaker.so's stack frame and jump to the dlopen_real instead of calling it) so the the top-level return address would be in the original library's address space and glibc would then presumably apply its RPATH.

twhitehead commented 1 day ago

I see gcc has a musttail attribute that can be applied to a function call return

  [[gnu::musttail]] return foo();

Could it be as simple as that (well, a bit more complex actually, as the final call is actually two levels deep with a comment that there must have been a good reason for this...)?

dcommander commented 1 day ago

This has been brought up before (see #107, #250), and relevant application recipes have been added to work around it. (I could add one for ParaView as well.) It would be nice to work around it more generally. I'll experiment with the musttail attribute.

bartoldeman commented 16 hours ago

Having debugged this with @twhitehead I was wondering about the reason for using libdlfaker by default, and as far as I can see it's meant for programs that dlopen libGL.so instead of linking to it directly. From what I understand this used to be really common, but nowadays maybe a little less? Paraview links to it directly, but I think e.g. MATLAB doesn't.

dcommander commented 16 hours ago

It's perhaps even more common now than it used to be. There are entire frameworks that use dlopen()/dlsym() to load OpenGL functions, so any application built with one of those frameworks would fail if the dlopen() interposer weren't preloaded by default. The goal of VirtualGL is to "just work" for the maximum possible number of applications, i.e. to minimize the number of application recipes required. Thus, it would be really nice to work around the RPATH issue, because that would allow me to remove even more recipes. However, as of this moment, that issue is known to affect only three applications. Furthermore, I'm not sure which version of ParaView started experiencing the issue, but I know that older versions worked fine with VirtualGL.

bartoldeman commented 14 hours ago

For completeness the sequence here is as follows (why Paraview crashes here):

paraview links to libospray.so.2 via libvtkRenderingRayTracing-pv5.11.so. libvtkRenderingRayTracing-pv5.11.so has an RPATH to the dir with libospray.so.2.

libospray.so.2 successfully dlopens libospray_module_cpu.so.2 in the same directory

libospray_module_cpu.so.2 directly links to libispcrt.so.1; the RPATH tolibispcrt.so.1 is in libospray_module_cpu.so.2, but not in the upper levels.

libispcrt.so.1 tries to dlopen libispcrt_device_cpu.so.1 (in the same directory as libispcrt.so.1), which fails with virtualgl, even if libispcrt.so.1 and libospray_module_cpu.so.2 have that in their RPATH. Instead (with libdlfaker) it searches using the top-level RPATH from the paraview executable.

Hope that helps for reference.