Correct use of DRI_PRIME and __NV_PRIME_RENDER_OFFLOAD for OpenGL applications under Wayland

tim-rex commented 5 months ago

I'm not too clear on the difference between DRI_PRIME and __NV_PRIME_RENDER_OFFLOAD and when one should be used over the other, however I appear to have issues with both arrangements when running my own OpenGL application which I believe are related to EGL initialisation (or possibly driver support) and subsequent fallback to Mesa/swrast.

I'm running an RX 580 as my primary display device, and a GTX 960 as my secondary device. I encounter no such issues when running with just the nVidia drivers (and thus, no confusion between drivers, format modifiers, and no need for DRI_PRIME)

Running with DRI_PRIME=10de:13c2 will cause EGL to correctly identify the GTX 960 for rendering, however any attempt to eglInitialize() results in libEGL warning: egl: failed to create dri2 screen and I find egl then fails back to Mesa with swrast or zink.

I observe similar behaviour with eglgears, eglgears_wayland, es2gears, es2gears_wayland 'glxgears' behaves similarly with failed to create dri3 screens before failing back to Mesa llvmpipe

Is it expected that DRI_PRIME targeting a GTX 970 should fail to create a dri2 surface?

I strongly suspect this is the same DRM format modifier support issue described in #96, but I raise it only because I had thought this to be working previously. I suspect I was mistaken in that, and that very probably the fallback to swrast had me believe that was the case.

Running with __NV_PRIME_RENDER_OFFLOAD=1 results in glCheckFramebufferStatus(GL_FRAMEBUFFER) returning GL_FRAMEBUFFER_UNDEFINED

while the likes of es2gears_wayland and eglgears_wayland both hang (per below) which I grant are very probably issues with their implementation, or perhaps with Gnome Desktop and nothing to do with nVidia.

x00007ffff7cd3f6f in __GI___poll (fds=fds@entry=0x7fffffffe280, nfds=3, timeout=timeout@entry=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
29        return SYSCALL_CANCEL (poll, fds, nfds, timeout);                                                                                                                       
(gdb) bt
#0  0x00007ffff7cd3f6f in __GI___poll (fds=fds@entry=0x7fffffffe280, nfds=3, timeout=timeout@entry=-1) at ../sysdeps/unix/sysv/linux/poll.c:29
#1  0x00005555555577f9 in poll (__timeout=-1, __nfds=<optimized out>, __fds=0x7fffffffe280) at /usr/include/bits/poll2.h:39
#2  _eglutNativeEventLoop () at ../mesa-demos-9.0.0/src/egl/eglut/eglut_wayland.c:524
#3  _eglutNativeEventLoop () at ../mesa-demos-9.0.0/src/egl/eglut/eglut_wayland.c:477
#4  eglutMainLoop () at ../mesa-demos-9.0.0/src/egl/eglut/eglut.c:267
#5  main (argc=<optimized out>, argv=<optimized out>) at ../mesa-demos-9.0.0/src/egl/opengl/eglgears.c:318

es2gears_x11, eglgears_x11 and glxgears appear to run fine in this setup.

Is it valid or useful for a Wayland application to utilise __NV_PRIME_RENDER_OFFLOAD over DRI_PRIME?

I'm unable to explain why the framebuffer would report GL_FRAMEBUFFER_UNDEFINED in this scenario, or what I can/should do about it in my own applications

Arch Linux Linux Kernel 6.6.9 Gnome 45.2 nVidia Geforce GTX 970 running 545.29.06 Mesa 23.3.2

kbrenneman commented 5 months ago

The short version is that __NV_PRIME_RENDER_OFFLOAD is what the NVIDIA driver recognizes, and DRI_PRIME is what Mesa recognizes.

On the NVIDIA side, when you call eglGetPlatformDisplay, the egl-wayland library starts by using wl_drm to figure out what device the compositor is running on. If the server isn't running on an NVIDIA device, then by default, egl-wayland will leave the display unclaimed, so libglvnd moves on to Mesa.

If you set __NV_PRIME_RENDER_OFFLOAD=1, then if the server is not running on an NVIDIA device, then egl-wayland will pick an NVIDIA device to use for rendering and then on each eglSwapBuffers call, it'll blit the result to a shared buffer that the compositor's device can read.

Mesa's DRI_PRIME works similarly in that it can tell Mesa to use a different device for rendering than the device that the display server is running on. But since Mesa is usually the last entry in libglvnd's vendor list, if it can't use the device that the display server is running on, then it'll fall back to a software renderer instead of failing completely.

Also, if you've got more than one NVIDIA device in the system, then egl-wayland doesn't currently have a way to specify which device to use, so it'll just grab whichever one it finds first. It wouldn't be too difficult to add that, but I haven't gotten around to doing so yet. It would probably be either a separate variable like __NV_PRIME_RENDER_OFFLOAD_PROVIDER for GLX, or maybe just let you set __NV_PRIME_RENDER_OFFLOAD to a device node path or something.

kbrenneman commented 5 months ago

Note that I've got a proposal out for libglvnd to add a vendor-agnostic way to configure GPU offloading that works across EGL, GLX, and Vulkan: https://gitlab.freedesktop.org/glvnd/libglvnd/-/merge_requests/224

Plus a config file to let you configure the behavior of particular programs in a generic way: https://gitlab.freedesktop.org/glvnd/libglvnd/-/merge_requests/228

Unfortunately, I haven't been able to get much response from the Mesa side of things, and getting it to work with NVIDIA and Mesa are kind of the bare minimum. There's not much point in a vendor-agnostic interface if only one vendor implements it, after all...

tim-rex commented 5 months ago

Great, thank you @kbrenneman that's extremely helpful. Your proposal has a lot of merit, I look forward to seeing how it progresses!

I've dug a little further on my issue with __NV_PRIME_RENDER_OFFLOAD and it seems to be preventing Mutter or Wayland from creating a window frame surface.

I've documented that issue somewhat here but ultimately this isn't at all an issue for mesa, it's just a symptom.

I don't think I'm quite qualified to know where the root issue is stemming from, it could be any one of (or none of)

nVidia drivers
egl-wayland
DRI format modifier support as per DRI_PRIME
Gnome
Mutter's implementation of the Wayland protocol

If you set __NV_PRIME_RENDER_OFFLOAD=1, then if the server is not running on an NVIDIA device, then egl-wayland will pick an NVIDIA device to use for rendering and then on each eglSwapBuffers call, it'll blit the result to a shared buffer that the compositor's device can read.

That's interesting, as my own code will run despite not having a window frame but the resulting GL context appears to have an undefined default framebuffer.. though as I understand it, that's more likely due to not having a window frame surface in the first place.

kbrenneman commented 5 months ago

Hm, the missing wl_surface.frame might suggest that something in the egl-wayland's eglSwapBuffers implementation is getting stuck. Just form looking at it, though, I don't know where it might be getting stuck or why.

kbrenneman commented 5 months ago

@tim-rex -- Is that hang specific to Mutter, or can you reproduce it on other compositors, too?

kbrenneman commented 5 months ago

That's interesting, as my own code will run despite not having a window frame but the resulting GL context appears to have an undefined default framebuffer.. though as I understand it, that's more likely due to not having a window frame surface in the first place.

Internally, egl-wayland allocates a buffer in the client to render into, and then it's supposed to use zwp_linux_dmabuf_v1 to send that to the server. So, you'll have the same default framebuffer in OpenGL either way, but what appears to be happening here is that the server never receives that buffer.

tim-rex commented 5 months ago

@tim-rex -- Is that hang specific to Mutter, or can you reproduce it on other compositors, too?

~~At present, I don't have another compositor running that supports multi-gpu between nvidia/amdgpu (which I think precludes any valid testing of __NV_PRIME_RENDER_OFFLOAD)~~

~~I'm looking at kwin/plasma right now but it's not happy about it. I'm not aware if anything else supports multi-gpu, but happy to take suggestions!~~

Edit: Got my kwin/plasma setup working, can reproduce

tim-rex commented 5 months ago

Update... @kbrenneman I can confirm the same behaviour occurs on kwin/plasma (5.27.10) when __NV_PRIME_RENDER_OFFLOAD is enabled

kbrenneman commented 5 months ago

If mutter and kwin have the same behavior, then it's probably not a problem in the compositor. And I've at least gotten __NV_PRIME_RENDER_OFFLOAD to work with Mutter on Intel+NVIDIA systems.

Normally, if presentation wasn't working, my first thought would be that the ATI device doesn't accept a pitch linear dma-buf, or that it has limited support which the compositor doesn't correctly check for. But, if that was the problem, then there would still be a zwp_linux_buffer_params_v1 stanza from the client. From those logs, it's not even getting far enough for format/modifier compatibility to matter.

I'd throw it into a debugger, except that I don't have that hardware configuration available to test on.

kbrenneman commented 5 months ago

@tim-rex -- Do you know if the application is getting far enough to call eglSwapBuffers?

tim-rex commented 5 months ago

It does get to eglSwapBuffers, and on a normal invocation it is precisely that call which instigates the frame.

  -> wl_surface@10.frame(new id wl_callback@34)
  -> zwp_linux_dmabuf_v1@9.create_params(new id zwp_linux_buffer_params_v1@35)
  -> zwp_linux_buffer_params_v1@35.add(fd 20, 0, 0, 1216, 16777215, 4294967295)
  -> zwp_linux_buffer_params_v1@35.create_immed(new id wl_buffer@36, 300, 300, 808669784, 0)
  -> zwp_linux_buffer_params_v1@35.destroy()
  -> wl_surface@10.attach(wl_buffer@36, 0, 0)
  -> wl_surface@10.damage(0, 0, 2147483647, 2147483647)
  -> wl_surface@10.commit()

When I look closer at a diff between the wayland debug output.. (and, I'm not sure how precisely I should expect these to agree, given the different EGL implementation).. I notice the NV_PRIME logs show an additional wl_registry.bind against wp_presentation but otherwise only few differences. Wether or not those differences are meaningful, I can't say..

I've attached those wayland logs here and tidied up for ease of diff analysis

not_nv_prime.txt

nv_prime.txt

tim-rex commented 5 months ago

How relevant is the _zwp_linux_dmabuf_feedbackv1.destroy() that appears in the non nv_prime logs? That line is emitted just as eglInitialize() returns

tim-rex commented 5 months ago

Just for fun, I thought to see if @Molytho's PR (https://github.com/NVIDIA/egl-wayland/pull/100) might circumvent this issue. Unfortunately I'm now seeing a seg fault in this scenario during wlEglCreatePlatformWindowSurfaceHook

Note that the PR seems fine without __NV_PRIME_RENDER_OFFLOAD, but interesting that I now encounter issues a little earlier in the execution.

$ __NV_PRIME_RENDER_OFFLOAD=1 WAYLAND_DEBUG=1 LD_PRELOAD=/home/timk/workspace/git/egl-wayland/.libs/libnvidia-egl-wayland.so.1.1.13 gdb git/mesa-demos-git/demos/builddir/src/egl/opengl/eglgears
GNU gdb (GDB) 13.2
...
...
...
[2824548.759] gtk_shell1@20.capabilities(0)
[2824548.766] wl_seat@27.capabilities(3)
[2824548.769]  -> wl_seat@27.get_pointer(new id wl_pointer@16)
[2824548.778]  -> zwp_pointer_gestures_v1@21.get_swipe_gesture(new id zwp_pointer_gesture_swipe_v1@11, wl_pointer@16)
[2824548.781]  -> zwp_pointer_gestures_v1@21.get_pinch_gesture(new id zwp_pointer_gesture_pinch_v1@9, wl_pointer@16)
[2824548.785]  -> wl_seat@27.get_keyboard(new id wl_keyboard@7)
[2824548.791] wl_seat@27.name("seat0")
[2824548.793] wl_callback@33.done(5704)
[2824548.796]  -> wl_registry@2.bind(11, "xdg_wm_base", 6, new id [unknown]@33)
[2824581.888]  -> xdg_wm_base@18.get_xdg_surface(new id xdg_surface@16, wl_surface@10)                                                                              
[2824581.901]  -> xdg_surface@16.get_toplevel(new id xdg_toplevel@15)
[2824581.908]  -> xdg_toplevel@15.set_app_id("eglgears")
[2824581.912]  -> xdg_toplevel@15.set_title("eglgears")
[2824581.916]  -> wl_surface@10.commit()

Thread 1 "eglgears" received signal SIGSEGV, Segmentation fault.
0x00007ffff7dfbaa9 in wl_list_insert (list=0x10, elm=elm@entry=0x5555557971f0) at ../wayland-1.22.0/src/wayland-util.c:48
48              elm->next = list->next;                                                                                                                             
(gdb) bt full
#0  0x00007ffff7dfbaa9 in wl_list_insert (list=0x10, elm=elm@entry=0x5555557971f0) at ../wayland-1.22.0/src/wayland-util.c:48
#1  0x00007ffff7dfbb32 in wl_proxy_create_wrapper (proxy=0x5555555f11b0) at ../wayland-1.22.0/src/wayland-client.c:2456
        wrapped_proxy = 0x5555555f11b0
        wrapper = 0x5555557971a0
#2  0x00007ffff7fba470 in wlEglCreatePlatformWindowSurfaceHook () at /home/timk/workspace/git/egl-wayland/.libs/libnvidia-egl-wayland.so.1.1.13
#3  0x00007ffff74a864b in  () at /usr/lib/libEGL_nvidia.so.0
#4  0x00007ffff74a86dd in  () at /usr/lib/libEGL_nvidia.so.0
#5  0x00007ffff7448b50 in  () at /usr/lib/libEGL_nvidia.so.0
#6  0x0000555555559df1 in _eglutCreateWindow (title=0x55555555d010 "eglgears", x=0, y=0, w=300, h=300) at ../src/egl/eglut/eglut.c:164
        win = 0x555555629150
        context_attribs = {12344, 21845, 1431680976, 21845}
        api = 12450
        i = 0
#7  0x000055555555a345 in eglutCreateWindow (title=0x55555555d010 "eglgears") at ../src/egl/eglut/eglut.c:309
        win = 0x7fffffffe4a8
#8  0x00005555555597ae in main (argc=1, argv=0x7fffffffe4a8) at ../src/egl/opengl/eglgears.c:309

It is at this point that I'd expect to see:

[3258872.108]  -> zwp_linux_dmabuf_v1@9.get_surface_feedback(new id zwp_linux_dmabuf_feedback_v1@24, wl_surface@10)

Molytho commented 5 months ago

This looks like an issue in my patch. Apparently the queue attached to the wl_proxy gets NULL at some point. I have to look into this. Well, found it ...

tim-rex commented 5 months ago

This looks like an issue in my patch. Apparently the queue attached to the wl_proxy gets NULL at some point. I have to look into this. Well, found it ...

Confirmed seg fault fixed with latest push, thanks @Molytho Original behaviour of this issue unchanged.

tim-rex commented 5 months ago

It seems like perhaps there is a more fundamental issue at play here.. I just booted up with all other devices blacklisted, using only nvidia drivers.

When I run eglgears_wayland (no need for __NV_PRIME_RENDER_OFFLOAD) I find that while it does create the window frame, and the first frame renders, the process becomes blocked immediately on poll() in precisely the same way as the initial issue description.

Further, the process becomes unblocked as a result of damage events (eg: mouse cursor interaction) and immediately blocks again.

This feels like potentially a mutex issue in the nVidia driver that is released on damage, but is otherwise preventing normal frame updates (and in the __NV_PRIME case, perhaps this is occurring ahead of window frame creation).

Same behaviour for es2gears_wayland

Lastly.. I'm finding that eglgears_x11 is failing to create a dri2 screen and falls back to Mesa. I had been experiencing this when running multiple GPU's and relying on DRI_PRIME but had chalked this up to implicit modifier support offloading to an RX 580... evidently, that isn't the issue here.

Given this rather odd behaviour, I've attached an nvidia-bug-report nvidia-bug-report.log.gz

erik-kz commented 5 months ago

This is a known issue. There are a few applications out there, eglgears_wayland among them, that call poll on the Wayland display socket directly as part of their event loop Another example is weston-subsurfaces.

This is problematic because egl-wayland creates a background thread that also reads from the Wayland socket.

When there are multiple threads reading from the socket at the same time, they all need to use the wl_display_prepare_read and wl_display_read_events functions to avoid racing. We use these functions in egl-wayland, but the applications in question do not.

It's debatable whether this ought to be considered their bug. Should single-threaded applications still be required to read from the socket in a thread-safe way in case some library happens to create a background thread? I'm inclined to say no.

Fortunately, the pending explicit sync protocol https://gitlab.freedesktop.org/wayland/wayland-protocols/-/merge_requests/90 should eliminate the need for this background thread. We're currently working on implementing support for it.

tim-rex commented 5 months ago

Ahhh!! Okay, that explains a lot! Cheers @erik-kz

This partially explains why I don't see this issue in my own application. (I'm not directly polling on the socket, though I am running multithreaded and utilising wl_display_dispatch on one of those threads. Perhaps I got lucky, but I've noted previously that I should perhaps be using wl_display_prepare_read / dispatch_pending

Thanks also for the info regarding the explicit sync implementation.

The only piece that remains unexplained (for me) is why apps running as XWayland are unable to establish an EGL context on nVidia (and falling back to Mesa). This happens irrespective of multi-gpu or having only the nVidia drivers loaded.

Those same applications (mine included) run fine under native X11.

In both cases (x11 and wayland) if I try to circumvent Mesa and use the nVidia ICD directly (using __EGL_VENDOR_LIBRARY_FILENAMES) I get the following:

timk@archon ~]$ __EGL_VENDOR_LIBRARY_FILENAMES=/usr/share/glvnd/egl_vendor.d/10_nvidia.json EGL_LOG_LEVEL=debug eglgears_x11 
EGLUT: failed to initialize EGL display

In my own application, I'm finding that eglGetPlatformDisplayEXT() returns EGL_NO_DISPLAY and of course a subsequent eglInitialize() fails.

For the sake of clarity, I'm requesting an EGL_EXT_platform_x11 display (not XCB which I'm aware is not currently supported for XWayland)

cubanismo commented 5 months ago

It's debatable whether this ought to be considered their bug. Should single-threaded applications still be required to read from the socket in a thread-safe way in case some library happens to create a background thread? I'm inclined to say no.

I agree it's debatable, but I would assert that all applications should be required to use the socket in a thread-safe way. NVIDIA/egl-wayland happens to use a background thread in our current implementation, which has other issues with protocol correctness and will go away soon hopefully.

That aside, it seems reasonable that other drivers or libraries may want to use background threads to listen for events on wayland connections for whatever reason. All kinds of things were way harder than they had to be w/libX11 because of the XInitThreads() situation, and in the end, I believe the maintainers decided to auto-call XInitThreads() to make applications threadsafe in general. I think it'd be a regression to go back to the situation where applications can decide whether they want their usage of the windowing system to be threadsafe or not. There's no great way to enforce that AFAIK other than noting it in the docs, but I think that'd probably be worth doing.

NVIDIA / egl-wayland

Correct use of DRI_PRIME and __NV_PRIME_RENDER_OFFLOAD for OpenGL applications under Wayland #98