Open chergert opened 2 years ago
lgtm, thanks @chergert :)
Okay, fixed a coordinate issue with the rectangle which I was finally able to see now that I added a second commit, which passes damage regions across from the producer to the consumer thread w/ a lock-free ring-buffer (3 frames deep, up to 32 rectangles per frame).
I decided on such a large rectangle count because I'm regularly seeing rectangles from GTK in the 20's with very occasional peeks to 30.
A depth of 3 frames seem to be plenty with experimental testing.
Thank you very much for the patch. Two minor points
It's probably not hugely important in practice, but for the lock-free ring buffer, would we be able to use memory_order_acqure / memory_order release for the first and second atomic_thread_fence calls instead of memory_order_seq_cst for both? Admittedly I don't have a lot of experience with the stdatomics API so if I'm wrong feel free to tell me, but in theory that might be slightly faster, right?
Indentation is off in WlEglStreamDamageBufferRec
Otherwise I've applied the patch locally and everything looks good. I've also run it through our automated test system and that came back clean. Note that I will need to submit the change to our internal Perforce depot before merging on GitHub (sorry, don't have a great system for handling external contributions at the moment).
would we be able to use memory_order_acqure / memory_order release for the first and second atomic_thread_fence calls instead of memory_order_seq_cst for both?
I too don't have much experience with these APIs other than making our ring buffer work for Sysprof's perf_event integration. That code was just looking for a replacement for __sync_synchronize()
which is definitely the heavy-handed approach to correctness.
After going through and reading the semantics of these, I think you're right and some quick testing shows it working well.
I'll update the patches quick, and thanks for the review!
Note that I will need to submit the change to our internal Perforce depot before merging on GitHub
And no worries on this, I know the drill with regards to corporate integrations. I had to deal with the same sort of issues at VMware years ago (with perforce no less).
Turns out _Atomic
is not what we want actually causes issues (at least with GTK 4) so reverted back to volatile
.
Turns out
_Atomic
is not what we want actually causes issues (at least with GTK 4) so reverted back tovolatile
.
Could you elaborate? volatile generally isn't meant for inter-thread synchronization purposes from what I've read.
Could you elaborate? volatile generally isn't meant for inter-thread synchronization purposes from what I've read.
I don't believe we're actually relying on volatile
here either given how it's using the other barrier enforcing primitives. So it's more likely that we could drop volatile
than needing to add _Atomic
.
I was asking what issues _Atomic causes and how you determined it isn't what we want here.
I was asking what issues _Atomic causes and how you determined it isn't what we want here.
I can't answer that for you, I've never used these features from the language. I've just done the ring buffer the same way it's done for perf_event mmap streams.
It sounds like there isn't sufficient shared understanding to choose the correct solution then. "Because another app does it this way" isn't suitable in lieu of actually being able to describe why the code you're requesting to contribute is correct when questioned.
Honestly, it doesn't matter too much to me anymore because the damage thread is already broken by design. The driver constantly locks up on Wayland and so while adding a working eglSwapBuffersWithDamage()
makes it lock up less for me, I can't fix the completely brokenness underneath it which is not publicly available.
I was asking what issues _Atomic causes and how you determined it isn't what we want here.
From what I've read from https://en.cppreference.com/w/c/language/atomic, it would look like using _Atomic should be fine, and what was causing me to revert my change is just fundamental breakage in the underlying driver for which I can't work around.
Previously, wlEglSendDamageEvent used the entire surface as damage to the compositor but in the wrong coordinate system. wl_surface_damage() requires coordinates for the surface which could be scaled while wl_surface_damage_buffer() expects buffer coordinates which is what surface->width and surface->height represent.
This ensures that the parameters to eglSwapBuffersWithDamage() are passed along to the compositor as well. The coordinate system is flipped between eglSwapBuffersWithDamage() and wl_surface_damage_buffer() which is handled as well.
Signed-off-by: Christian Hergert chergert@redhat.com