egl-swap: provide damage rectangles to wl_surface

chergert commented 2 years ago

Previously, wlEglSendDamageEvent used the entire surface as damage to the compositor but in the wrong coordinate system. wl_surface_damage() requires coordinates for the surface which could be scaled while wl_surface_damage_buffer() expects buffer coordinates which is what surface->width and surface->height represent.

This ensures that the parameters to eglSwapBuffersWithDamage() are passed along to the compositor as well. The coordinate system is flipped between eglSwapBuffersWithDamage() and wl_surface_damage_buffer() which is handled as well.

Signed-off-by: Christian Hergert chergert@redhat.com

fooishbar commented 2 years ago

lgtm, thanks @chergert :)

chergert commented 2 years ago

Okay, fixed a coordinate issue with the rectangle which I was finally able to see now that I added a second commit, which passes damage regions across from the producer to the consumer thread w/ a lock-free ring-buffer (3 frames deep, up to 32 rectangles per frame).

I decided on such a large rectangle count because I'm regularly seeing rectangles from GTK in the 20's with very occasional peeks to 30.

A depth of 3 frames seem to be plenty with experimental testing.

erik-kz commented 2 years ago

Thank you very much for the patch. Two minor points

It's probably not hugely important in practice, but for the lock-free ring buffer, would we be able to use memory_order_acqure / memory_order release for the first and second atomic_thread_fence calls instead of memory_order_seq_cst for both? Admittedly I don't have a lot of experience with the stdatomics API so if I'm wrong feel free to tell me, but in theory that might be slightly faster, right?
Indentation is off in WlEglStreamDamageBufferRec

Otherwise I've applied the patch locally and everything looks good. I've also run it through our automated test system and that came back clean. Note that I will need to submit the change to our internal Perforce depot before merging on GitHub (sorry, don't have a great system for handling external contributions at the moment).

chergert commented 2 years ago

would we be able to use memory_order_acqure / memory_order release for the first and second atomic_thread_fence calls instead of memory_order_seq_cst for both?

I too don't have much experience with these APIs other than making our ring buffer work for Sysprof's perf_event integration. That code was just looking for a replacement for __sync_synchronize() which is definitely the heavy-handed approach to correctness.

After going through and reading the semantics of these, I think you're right and some quick testing shows it working well.

I'll update the patches quick, and thanks for the review!

chergert commented 2 years ago

Note that I will need to submit the change to our internal Perforce depot before merging on GitHub

And no worries on this, I know the drill with regards to corporate integrations. I had to deal with the same sort of issues at VMware years ago (with perforce no less).

chergert commented 2 years ago

Turns out _Atomic is not what we want actually causes issues (at least with GTK 4) so reverted back to volatile.

cubanismo commented 2 years ago

Turns out _Atomic is not what we want actually causes issues (at least with GTK 4) so reverted back to volatile.

Could you elaborate? volatile generally isn't meant for inter-thread synchronization purposes from what I've read.

chergert commented 2 years ago

Could you elaborate? volatile generally isn't meant for inter-thread synchronization purposes from what I've read.

I don't believe we're actually relying on volatile here either given how it's using the other barrier enforcing primitives. So it's more likely that we could drop volatile than needing to add _Atomic.

cubanismo commented 2 years ago

I was asking what issues _Atomic causes and how you determined it isn't what we want here.

chergert commented 2 years ago

I was asking what issues _Atomic causes and how you determined it isn't what we want here.

I can't answer that for you, I've never used these features from the language. I've just done the ring buffer the same way it's done for perf_event mmap streams.

cubanismo commented 2 years ago

It sounds like there isn't sufficient shared understanding to choose the correct solution then. "Because another app does it this way" isn't suitable in lieu of actually being able to describe why the code you're requesting to contribute is correct when questioned.

chergert commented 2 years ago

Honestly, it doesn't matter too much to me anymore because the damage thread is already broken by design. The driver constantly locks up on Wayland and so while adding a working eglSwapBuffersWithDamage() makes it lock up less for me, I can't fix the completely brokenness underneath it which is not publicly available.

chergert commented 2 years ago

I was asking what issues _Atomic causes and how you determined it isn't what we want here.

From what I've read from https://en.cppreference.com/w/c/language/atomic, it would look like using _Atomic should be fine, and what was causing me to revert my change is just fundamental breakage in the underlying driver for which I can't work around.

NVIDIA / egl-wayland

egl-swap: provide damage rectangles to wl_surface #50