NVIDIA / egl-wayland

The EGLStream-based Wayland external platform
MIT License
275 stars 43 forks source link

wayland application blocking on poll() when requesting SRGB colorspace #102

Closed tim-rex closed 2 months ago

tim-rex commented 5 months ago

I'm seeing some unexpected cross-play between two seemingly separate issues that I've not observed previously..

Beta drivers 550.40.07 Arch Linux Kernel 6.7.1 nVidia GTX 970

Those two issues are: https://github.com/NVIDIA/egl-wayland/issues/85 https://github.com/NVIDIA/egl-wayland/issues/98#issuecomment-1892920697

Previously, my wayland event loop looked like the below. Not thread safe and at risk of exposing the issue described in #98 where the nVidia drivers introduce threaded behaviour.

while (wl_display_dispatch(wayland_display) != -1 && running) {
    check_errno();
}

if (errno && errno != EAGAIN)
{
    int errcode = wl_display_get_error(wayland_display);
    debug_printf(LOGLEVEL_ERROR, "Wayland bailed!! display error =%d", errcode);
    return -1;
}

At that time (using previous driver versions), requesting an EGL_GL_COLORSPACE_SRGB would result in the issue described in #85 I could work around this by requesting a linear colorspace instead, which I've been doing ever since.

Since raising #98, I have since modified the event loop to be thread safe based on examples in the wild, making use of wl_display_prepare_read/dispatch_pending as described in #98, and as follows:


struct pollfd fds[2] =
{
    { wl_display_get_fd(wayland_display), POLLIN },
    { -1, POLLIN }
};

while (running)
{
    while (wl_display_prepare_read(wayland_display) != 0)
        wl_display_dispatch_pending(wayland_display);

    wl_display_flush(wayland_display);

    int ret = poll(fds, sizeof(fds) / sizeof(fds[0]), -1);
    if (ret == -1)
    {
        // poll error
        check_errno();
        wl_display_cancel_read(wayland_display);
    }
    else
        wl_display_read_events(wayland_display);

    wl_display_dispatch_pending(wayland_display);
}

What's interesting here is that if I now attempt to request an EGL_GL_COLORSPACE_SRGB context, the application will block on poll() in the supposedly threadsafe event loop... while the non-threadsafe event loop restores the behaviour seen in #85

I continue to see correct behaviour when requesting a linear colourspace context using either event loop..

It seems odd that poll() would only block when I request an SRGB colourspace.

I'm raising this independently of #85 and #98 in an effort to avoid confusing the two issues (if they are indeed entirely unrelated)

erik-kz commented 5 months ago

As observed in #85, requesting an SRGB color space triggers a protocol error which ultimately causes the Wayland socket to be closed. If another thread is polling the socket when this happens, I believe it's possible for the poll call to hang indefinitely.