Inlining attributes? - Githubissues

vittorioromeo commented 2 months ago

Thank you for the very nice article! I was surprised to see no mention of attributes that can force inlining, even in debug mode.

Could you repeat your benchmarks by using the original lambda-based solution, adding attributes such as gnu::always_inline, and gnu::flatten which should be support by both GCC and Clang?

I'm confident MSVC has an equivalent too.

Would be nice to see if just a few attributes can solve the problem without having to sacrifice the abstraction level.

aras-p commented 2 months ago

Yeah should have tested them, even if I know that "I'm confident MSVC has an equivalent too" is false, which cuts out a major compiler :/

MSVC today simply does not have a way to inline anything in default /Od build configuration (the only way to inline is... C macros). Next level that it has is /Ob1, which then tells the compiler "inline anything marked as inline, __force_inline or declared inline inside a class", which is a large step from "inline nothing". It would be helpful if there was a setting in between, yes (there's an outstanding feature request)

I have quickly checked Clang 17 here, using various combinations that it can do:

[[gnu::always_inline, gnu::flatten]] attributes on functions,
[[gnu::always_inline, gnu::flatten]] on the lambdas used in unroll (needs to turn on C++23 mode, which for Blender codebase is still not there -- it is C++17 right now)
[[clang::always_inline]] attribute on the call sites

And the results are like, slowdown compared to "Release" (/O2, no asserts):

Unroll+Lambdas, asserts, /Od: 129x
Unroll+Lambdas, asserts, /Od plus various ways to force inlining: all of them and combinations of them: 95x
Explicit xyzw accesses, asserts, /Od: 17x
Raw C code: 6x

So yes, it helps, but only a tiny bit :( And it does not help MSVC at all, due to above.

vittorioromeo commented 2 months ago

Thank you so much for trying it out! I'm really saddened by the fact that it does not make a huge difference.

I would still give a few more attributes a try, especially [[gnu::optimize(/* level */)]] that can be applied on a per-function basis.

I guess that always optimizing lower level functions such as mathematical vector operations should be acceptable even in non-optimized debug builds, right?

aras-p commented 2 months ago

I guess that always optimizing lower level functions such as mathematical vector operations should be acceptable even in non-optimized debug builds, right?

I would agree, yes. In general, in a large codebase you usually have a set of "core libraries" that are kinda written/debugged once and then they are fairly stable for years, without much churn on them. Having them be "optimized" even in an otherwise "debug" build would make sense.

It makes it harder with inlined/header-only/templated code, since it is not like you can "build" these libraries once and have the rest of the code "link" to them; they are instantiated by the call sites. Which are outside of said "core library" to begin with.

vittorioromeo commented 2 months ago

I tried annotating everything relevant with [[gnu::always_inline, gnu::flatten, gnu::optimize("-O3")]]

[[gnu::always_inline, gnu::flatten]] made a big difference, but nowhere near raw C code.

[[gnu::optimize("-O3")]] didn't actually do much at all, and isn't even supported by Clang -- turns out Clang has no way to selective increase optimization level for specific functions, but only to disable optimization.

Sad that we have to go through all these hoops or revert to C-like code to get good debug performance, but I guess it's just the way it is with the current compiler tech.

aras-p / test_math_vec_debug_perf

Inlining attributes? #1