Generalize 'inline' hints to 'instruction_frequency' or 'block_frequency'?

WebAssembly / compilation-hints

Compilation Hints

Other

0 stars 0 forks source link

Generalize 'inline' hints to 'instruction_frequency' or 'block_frequency'? #8

Open eqrion opened 3 weeks ago

eqrion commented 3 weeks ago

I've been trying to understand how to interpret the 'inline' hint section.

If I understand correctly, it only specifies how many times a call instruction is executed during an average execution of the function that contains it. This can be used to hint that a call is 'cold' and so even if the call target is completely monomorphic and inlineable, it may not be worth inlining.

This seems to be generally useful, and not specific to call instructions. Could we generalize so that these hints can point at any instruction? If we did that we could rename this to something like 'instruction_frequency' (or something else).

There's some strong overlap here with branch hinting, because the frequency of an instruction getting executed is really tied to how often its block is executed, which is tied to how often the branches that target it execute. So an alternate design could be to specify how frequently blocks/branches are executed, instead of the call instruction within them.

ecmziegler commented 2 weeks ago

We could do that, sure. Such information is very expensive to collect through instrumentation, but could be easily obtained through sampling-based profiling with incomplete but likely good enough information to be interesting.

I haven't thought much about what we could to with that information, but there is certainly potential. We could e.g. decide not to do certain expensive optimizations within rarely executed blocks (e.g. loop unrolling). Knowing that a loop gets executed 10000 times vs only once could make a difference.

eqrion commented 2 weeks ago

We could do that, sure. Such information is very expensive to collect through instrumentation, but could be easily obtained through sampling-based profiling with incomplete but likely good enough information to be interesting.

I haven't thought much about what we could to with that information, but there is certainly potential. We could e.g. decide not to do certain expensive optimizations within rarely executed blocks (e.g. loop unrolling). Knowing that a loop gets executed 10000 times vs only once could make a difference.

I agree that collecting this information for all blocks would be expensive, and I don't have a big use-case for it outside of call instructions yet.

It seems like there wouldn't be much of a cost to generalizing this hint, as long as we can let the profilers (that record the hints) decide how much of this information is worth collecting and shipping in a binary. I'd guess they would by default only collect for call sites.

titzer commented 2 weeks ago

I like the idea of generalizing to instruction frequency. Personally I wouldn't over-index on how expensive it is to collect instruction frequency, because there are valid offline profiling scenarios where instrumentation cost isn't an issue. E.g. we're looking at using Wizard to analyze large repeatable program traces now and it's easy to run them over and over again with different analyses.

ecmziegler commented 1 week ago

We can certainly allow for additional hints here. I just want to limit the scope of the first version to quantities that we can reliable generate and that will impact optimizations in at least one engine.

I will try to come up with a list of potential annotatable instruction types and what we could use them for so people understand if it's worth collecting (and adding, because binary size is also a factor here) such information.

Generally, I want all of these optimizations to be extensible and optional, so adding future frequency annotations will be backwards compatible. Perhaps this needs to made clearer in the overview.