Open osa1 opened 9 months ago
I agree that inlining is interesting to include in the first version already. So far, we planned to use the call_target
information for call_ref
and call_indirect` as heuristics for inlining as well. But for direct calls, this does not make much sense.
My personal preference would be to keep those annotations on per-call level and not function level. But I'd be open to a combined solution like you suggested. I would just not feel comfortable with a function-level only version.
I would also prefer some kind of numeric attribute with e.g. 0 being equivalent to "never-inline" and 100 to "always-inline" but leaves the engine some room for prioritizing calls if a function would get too big to process on low-end devices if all inlining hints would be too expensive.
I think we should add inlining hints to the draft. Previous discussion on inlining hints: https://github.com/WebAssembly/branch-hinting/issues/18.
Some of the use cases that would be good to cover with inlining hints:
"always-inline" and "never-inline" hints:
By default, a tool or engine uses heuristics as usual. These two disables the heuristics and tell the engine/tool to always or never inline a particular function, or callee in a particular call site.
Allow annotation call sites and functions:
Annotating functions is useful because an optimization pass can turn an indirect call (i.e. a
call_indirect
orcall_ref
) to a direct call (call
). In those cases the original indirect call instructions won't have an inlining hint, but the function directly called can.Secondly, annotations on functions can be used by a speculative inliner. A function with "never-inline" hint should not be inlined speculatively.
Annotating call sites is to allow expressing "never inline this function, except in this call site".
An example use case: in dart-lang/sdk#54395 we have an error handling code path that we want to never inline. If wasm-opt inlines it in compile time, or V8 inlines it in runtime (or instantiation time), V8 decides to not speculatively inline the calls to the parent function (because the function gets large with the error handling), leaving 7% perf on the table.