dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.15k stars 4.71k forks source link

JIT: Interprocedural Analysis (IPA) in .NET 10 #108931

Open AndyAyersMS opened 4 days ago

AndyAyersMS commented 4 days ago

Interprocedural Analysis and Optimization

Interprocedural analysis (IPA) is a compilation technique where the analysis considers the effects of one procedure on another for some set or sets of interesting facts. In particular it is often interesting to consider:

More generally the IPA can be defined over a call graph representing the entire set of caller-callee relationships in some body of code.

IPA facts are typically used to direct program optimizations.

Currently the JIT can only optimize each method in isolation (with a limited form of IPA enabled by inlining) IPA would enable JIT optimizations that can cross method boundaries without requiring inlining.

Generally speaking IPA is only feasible during ahead-of-time (AOT) compilation steps, as it requires substantial amounts of time and space. But the optimizations enabled by IPA can be done AOT or just-in-time (JIT). The analysis done by IPA is potentially expensive (in time/space), but the results of the analysis may be compact, and the optimizations it enables can be done quickly. So we perhaps could embed the results of IPA as an extra payload within an assembly (similar to how we embed static PGO data today) and make the information available at JIT time, enabling IPA optimizations by the JIT.

Note in the presence of profiler-driven rejitting, the JIT cannot rely on pre-computed IPA facts as a guarantee, but they can be used as a strong hint. We may want a mode where we insist (in some manner TBD) that rejit updates be "conforming" so that they do not invalidate IPA facts.

Proposed .NET 10 work

The proposal here is to build up a general IPA framework (see notes below) and use it to enable one or more optimizations. As this mode may be NAOT specific, we might also need to develop better testing strategies for NAOT, or, also run this analysis when jitting (somehow) with suitable restrictions to ensure the analysis results are guaranteed to remain valid.

Related work: Use RyuJit as an IL Scanner (https://github.com/dotnet/runtime/issues/83021)

Work in Q4'25

Eugene Rozenfeld has done a prototype of an IPA framework built into crossgen2, so the first step is likely to revive this work and look into incorporating it into ILC.

Design Sketch

Here’s a sketch of how IPA could be built.

First, some goals

And a non-goal:

The main idea is to split IPA into two sub-parts: within (intra) procedural, and across (inter) procedural.

Intra procedural generates summary facts for methods based on the IL (aka jump functions) and ultimately consumes them, either directly (by reading summary data) or indirectly (via queries to the global solution).

Inter procedural stitches together the problem instance (call graph), orchestrates the concurrency, has a solver to reaching suitable fixed points, and may keep global models for some problem domains.

Summary facts are represented via individual finite lattices (can be infinite with suitable care). So there is some symbolic representation of facts with the ability to rank sets of facts and merge or intersect them. There are often bottom and top elements representing things like no information or conflicting information.

Usually the fixed point summaries are “sound” meaning that anything that can possibly be true is supported by the summary, but some things that cannot be true are supported by the summary. There is usually an accuracy-time-space tradeoff; more accurate results may take longer or require more detailed models.

Occasionally it is interesting to allow for unsound results, in particular if the system is trying to predict what is likely to be true rather than what must be true (~ probabilistic data flow). In such cases we can rely the facts as hints (say to drive inlining in likely profitable directions) or else take a dependence on them via guarded or speculative techniques to ensure we have the ability to recover if those facts end up not being true when the code is executed. For example, if we have to assume that a method may be called via reflection, we may not have all incoming call graph edges for the node corresponding to the method. Therefore, we can’t compute the exact set of values that the method’s parameters may take but we may be able to compute the set that is likely. Optimizations can still take advantage of the additional information about the parameters but have to make sure the code that is ultimately executed is correct for all cases. Another example is the possibility of method bodies that can change after the analysis (e.g., via profiler re-jit). In this case, information computed about a method before its body changed may not be correct. If that information was used for optimizing any other methods (e.g., the method’s callers), the code for those methods has to be re-generated. We could use a technique similar to the current inline tracking to detect when one method’s codegen depends upon another so that the right invalidations can happen.

The Call Graph

The IPA solver first starts by constructing the call graph. This usually requires IL analysis to try and determine which methods are called. The call graph must represent external or unknown callers (since not all call sites may be knowable) and external and/or unknown callees (since the full set of methods that can be called may not be knowable). There are varying degrees of sophistication in call graph construction. Determining the set of methods that can be called by a method is itself an IPA problem. The problem may also have dynamic behavior (say, determining the set of required generic instantiations) where as the solution process evolves the candidate set of methods can change. The call graph need not be 100% accurate (indeed, it usually cannot be) but it should be a sound approximation. At some point the IPA solver decides it has a suitably sound approximation of the call graph and we move onto the next stage.

[The “Base case” here is to just assume any method can be called from unknown places and calls unknown other methods, this causes IPA to degenerate to regular interprocedural analysis. This can be useful for handling things like debug compiles, where we would prefer not to have information propagate across methods, and all method codegen can in principle happen concurrently (as the degenerate call graph dag is just a bunch of isolated nodes)].

For partitioned problems the externally calling and callable methods may be represented as placeholders, and their IPA summaries made available to help refine the results in the current partition.

Creation of a DAG

From the call graph, the solver then creates a partial order (dag) of methods, typically by running cycle detection (dfs). This dag defines the concurrency constraints for the local analysis. Commonly we are interested in either top-down (callers->callees) or bottom-up (callees->callers) traversals of this dag.

Passes

The solver then runs some number of alternating traversal (concurrent, over the dag) and solver passes.

In a traversal pass, the local analysis is run; it can access facts produced in prior passes as well as facts available from “predecessor” nodes in the current ordering. It produces new summary facts and/or (ultimately) code based on those facts. This pass runs concurrently with “schedule” determined by the dag.

In a solver pass, the IPA system uses the accumulated summary facts to reach fixed points. This uses the full call graph and is usually solved iteratively: a node is chosen, an a new summary state is generated from the current state and the most recent ‘input’ states (callers for a top-down, callees for bottom-up) and the new result is published. The entire process iterates until each node reaches a fixed point where no node changes state. There are various routes to accelerate convergence here, eg if the call graph is truly a dag and has no cycles then any uni-directional analysis will converge in one pass. This solving typically runs non-concurrently, though concurrent iteration is possible.

Generally the IPA solver part is maybe 10% of overall work, the local analysis 90%. Amdahl’s law scaling says speedups of 9x are possible. 4-6x is more realistic and is what we have seen in similar implementations in other compilers.

The summary facts may be bundles of independent lattice results or the lattice solutions can interact. For example if we are doing both interprocedural constant propagation and interprocedural alias analysis, the parts of code reachable in a method will depend on constant propagation, and this in turn may alter the sets of possible aliases. It is well known that “mutual fixed points” can be more accurate than trying to fix point each problem separately.

Particular problems can often be classified as top-down or bottom-up. Some problems are bidirectional.

Generally summaries represent facts true at all invocations of a method. One can try and do path-specific analysis where summary data applies only to some callers or some specific pattern of callers (sometimes called contextual) but when doing this generally the memory requirements and logistics get complicated. Sometimes you can special case this (eg a helper that just turns around and invokes a method passed to it as an argument). Otherwise high fan-in fan-out nodes can dramatically increase the amount of approximation in a solution, as facts flow along so called “infeasible paths.”

One can also do contextual analysis via inlining. When generating code concurrently and actually inlining, care must be taken not to access IPA facts out of dependence order, so some special access check logic is needed for summary accesses.

Analyses

Typical top-down problems are:

Typical bottom-up problems are:

And some true IPA problems:

Historically in native compiles just doing IPA hasn’t been that big a of a perf win (think single digits). Many times code expansion is needed to take advantage of IPA facts and this expansion can’t be done blindly, as it has costs.

But in conjunction with profile feedback to help focus code expansion into profitable areas, IPA it really shines.

Things may be different for managed code; we may see more substantial benefits “more cheaply” because of the relatively higher costs of some of our abstractions.

dotnet-policy-service[bot] commented 4 days ago

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.