NVIDIA / egl-wayland

The EGLStream-based Wayland external platform
MIT License
293 stars 47 forks source link

Defer resizes until next SwapBuffers (work around #52) #53

Closed sulix closed 2 years ago

sulix commented 2 years ago

As explained in #52 and https://github.com/libsdl-org/SDL/pull/4821, several games are broken under nVidia/Wayland due to calling wl_egl_window_resize() from a different thread.

This change works around this by deferring the resize until eglSwapBuffers(), which fixes the aforementioned games.

I'm not sure if this is the right solution to this problem — and the implementation could use a lot of cleanup, even so, but it seems to be a marked improvement for me.

sulix commented 2 years ago

Pushed a new version with @flibitijibibo's suggestions and a couple of other minor style fixes. It's possible there's a nicer way of going about this still (maybe by saving off the desired size, and comparing that, rather than having an isResized variable, which I think is closer to what Mesa does…), but this seems to work for now.

As a totally unscientific possibility, this may have fixed (or made more rare) this hang in KDE's plasmashell. It's difficult to know for sure, as it happens quite randomly and rarely, but I haven't noticed it since making this change. Of course, plasmashell still has several other bugs, and this could be coincidence still…

erik-kz commented 2 years ago

Thank you for the patch, and sorry for the very slow review. Apart from the one minor question above I think it looks good.

For what it's worth, I first noticed this issue in plasmashell which also does wl_egl_window_resize from a different thread and tried a very similar fix to this. However it didn't seem to be sufficient in that case. If I recall correctly, the reason was that resizing the surface right before eglSwapBuffers would discard any rendering done to the old surface, so you'd just get a black window unless / until something prompts it to redraw again.

However I imagine this isn't as much of an issue for SDL applications since they would typically be rendering every frame, so at the very least it's a strict improvement on the status quo.

When it was last discussed internally, I believe the consensus was that to really fix the issue properly we would need an EGL extension to allow you to explicitly resize a stream. There was a draft version of such a thing floating around but it never ended up getting finalized. But anyway, until that happens this workaround will be good to have.

sulix commented 2 years ago

Thanks for having a look.

I'm afraid not sure what you mean by the "minor question above" — could you clarify?

As for the plasmashell issue, I'm pretty confident this patch is solving at least one of them: I was getting regular hangs before applying it, and am not anymore. However, it turns out there was also a non-nVidia-specific Qt issue, which had exactly the same symptoms of a hang. I've got that patch applied too, and no hangs. (Though, as mentioned, there are a few other cases of recently-resized things being mysteriously black and other visual corruption in Plasma, though nothing that seems like a regression…)

erik-kz commented 2 years ago

I'm afraid not sure what you mean by the "minor question above" — could you clarify?

Apologies, I was referring to my comment about the wl_surface_commit call, but I just realized it was still marked as "pending".

sulix commented 2 years ago

I've uploaded a new version with the wl_surface_commit call removed: everything still seems to work.

erik-kz commented 2 years ago

Gotcha, thanks. I think the updated patch looks good. I've run it through our internal automated test system and everything came back clean. I just need to get a +1 from another NVIDIA engineer.

sulix commented 2 years ago

Glad to hear this passed the tests.

I've been running this for a while, and haven't noticed anything either. I've only had plasmashell's rendering hang on me once since, and it was at the same time as a bunch of other applications died with messages like this, so I'm assuming its unrelated:

[drm:nv_drm_gem_alloc_nvkms_memory_ioctl [nvidia_drm]] *ERROR* [nvidia-drm] [GPU ID 0x00000100] Failed to allocate NVKMS memory for GEM object
steam[5144]: segfault at 0 ip 00000000f62f0b5d sp 00000000ff9a85c0 error 4 in libnvidia-glcore.so.510.60.02[f54c4000+1266000]
NVRM: Xid (PCI:0000:01:00): 31, pid=32703, Ch 00000030, intr 10000000. MMU Fault: ENGINE GRAPHICS GPCCLIENT_PROP_0 faulted @ 0x0_0489d000. Fault is of type FAULT_PDE ACCESS_TYPE_WRITE