Open EgorBo opened 4 years ago
I'm not sure how long this would take to do, but as quick workaround we could also mark methods from IntPtr class as AggressiveInlining
FWIW I have ambitions about rewriting all the RyuJit inlining heuristics at some point. Some of what's there is worth keeping, but there are lots of areas that need improving.
Pros:
Cons:
Challenges:
It seems to be a recurring theme that some optimizations are too expensive because the necessary analysis is not available yet or is too expensive altogether. For example, the design document says:
Currently the jit walks it is IR in linear fashion deciding whether to inline each time it sees a candidate. If the decision is yes then the inlined code is spliced in place of the call and (because of the order of the walk) immediately scanned for inlining candidates. Thus the inlining is performed "depth first" and is done without much knowledge of the number or location of other candidates in the code stream. Inlining is done very early on before any significant analysis has been done to the IR -- there is a flow graph but no loop nesting, dataflow, or profile estimates are generally available.
Could precomputation help with this? There could be a build step that analyzes a set of assemblies and writes a summary useful for optimizations to disk. The summary could contain a call graph, for example. It could also contain profile data.
The analysis tool could also concretely evaluate every possible inlining decision by performing the inlining and then simplifying. This would expose inlining opportunities that result in code size reduction but that are not obvious to any simple heuristic.
Data useful for optimizations other than inlining could be included as well. A data flow summary comes to mind (e.g. "this variable can never be null; this other variable always is greater than zero").
This technique would not be AOT. The data would not be machine-specific or versioning-brittle.
Could precomputation help with this? Data useful for optimizations other than inlining could be included as well...
Yes, it is one of the things we plan to look into -- using precompilation analysis to leave information for jitting, either as "facts" or "hints".
The analysis tool could also concretely evaluate every possible inlining decision
Combinatorics makes this infeasible. If there are N top level call sites, there are 2^N possible ways to inline, all of which may produce different codegen.
@jandupej - something for 9.0
RyuJIT's heuristics (at least some of them) for inlining can be re-used in Mono since our inliner is quite conservative and only takes IL size into account (20 bytes for Jit, 30 bytes for LLVM-backend). RyuJIT also estimates native code size and is able to inline methods above the threshold. It uses so called observations and increases/decreases benefit multiplier based on them (Also, performance and size impact), e.g.:
we can start from some simple heuristics e.g.
Inline candidate has an arg that feeds a constant test
to inline methods above our IL size limit, e.g.:IL for
System.IntPtr:op_Inequality(long,long):bool :
RyuJIT's log:
While Mono-JIT (without LLVM) simply refuses to inline it based on
IntPtr.op_Inequality
's IL size (here). So even a simple "candidate has const arg that feeds a conditional" multiplier could help to jump over the limit and get a faster and smaller codegen for this case./cc @BrzVlad @vargaz @lewurm @marek-safar
I recently tried to implement my own RyuJIT heuristic just to learn how it works: https://github.com/dotnet/runtime/issues/33338#issuecomment-596149604