ValveSoftware / gamescope

SteamOS session compositing window manager
Other
2.82k stars 185 forks source link

steamcompmgr: use GetGeometry() more efficiently #1353

Closed sharkautarch closed 3 weeks ago

sharkautarch commented 3 weeks ago

Followup Pr to commit c5a5ba00b655caa1c10656dfe3ae09e281ec0544 From checking how GetGeometry() is compiled w/ gcc, it seems that the compiler does not know it is safe to automatically re-use the 128bit Rect returned from GetGeometry()

So unfortunately, doing

uint32_t sourceWidth = window->GetGeometry().nWidth;
uint32_t sourceHeight = window->GetGeometry().nHeight;

Would result in fetching the 128bit rect, getting the width from it And then re-fetching the 128bit rect again to get the height from it

Joshua-Ashton commented 3 weeks ago

Not sure it's that worth it to be honest, we are talking like a few cycles here. Don't think the microoptimization is really worth it.

sharkautarch commented 3 weeks ago

Not sure it's that worth it to be honest, we are talking like a few cycles here. Don't think the microoptimization is really worth it.

Ok, I've simplified the PR: somehow a slight refactor of GetGeometry() that leverages __builtin_unreachable() allows the compiler to optimize the calls to GetGeometry() Note that __builtin_unreachable() acts similar to an assert fail when doing a debug build, but is treated as an unreachable branch when building release builds (https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#index-funreachable-traps)

EDIT: it seems like __builtin_unreachable() wasn't entirely necessary, seems like simply reducing GetGeometry() to just having a single return is sufficient for the compiler to optimize stuff better.

Joshua-Ashton commented 3 weeks ago

Are you building with optimizations? These two should really be equivelant...

sharkautarch commented 3 weeks ago

Are you building with optimizations? These two should really be equivelant...

Yes, I've tested both versions w/ both compiler optimization and LTO on.

And yes, it is really weird that this small change actually makes a difference

Tho I've only tested what the disassembly looks like when compiling both versions w/ gcc (gcc version 14.1.1 20240522) so it could be possible that this is a missed optimization opportunity on gcc's end...

My speculation for why this makes a difference is either:

I will say that I had originally thought that adding __attribute__((pure)) to GetGeometry() would be the simplest way to get the compiler to optimize stuff better, but when I tested adding that attribute, there wasn't any difference in the compiler's output

sharkautarch commented 3 weeks ago

@Joshua-Ashton I've now realized that the reason why the refactor allowed gcc to optimize more, was that it simply caused gcc to decide to inline GetGeometry() more, and whenever gcc inlines the function, it optimizes out all of the redundant loads/stores. (It was easier to see this when I looked at the gimple IR code that gets dumped when adding -fdump-tree-optimized to the command line options for gcc)

closing this PR since this is entirely due to the behavior of gcc but note that you can force GetGeometry() to be inlined by adding __attribute__((always_inline)) to it (Rect __attribute__((always_inline)) GetGeometry() const { ... }) Note that this attribute is supported by both gcc and clang