dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.08k stars 4.7k forks source link

JIT: optimize redundant type tests created by PGO #46887

Open AndyAyersMS opened 3 years ago

AndyAyersMS commented 3 years ago

It seems likely that many of the type tests introduced by the jit will be redundant, as multiple virtual or interface calls can be made on the same object.

A first step towards optimizing these is the jump threading introduced in #46257, but this should be generalized to handle the case where there is computation in between the calls. In general this computation must be duplicated so in addition to the logistics of properly duplicating the code, some heuristics will be needed to decide when the duplication is worthwhile.

category:cq theme:profile-feedback skill-level:expert cost:medium

AndyAyersMS commented 3 years ago

It might also make sense to try and catch this earlier during fgOptimizeUncondBranchToSimpleCond so that in cases with back to back type tests on the same object the paths are disambiguated before we get to the optimizer.

AndyAyersMS commented 3 years ago

51890 gets many cases of this even earlier (during guarded devirt transform). But likely we'll want enhancements elsewhere as opportunities will arise downstream.

AndyAyersMS commented 2 years ago

69022 also helps improve jump threading, but we still will bail out when the block has multiple statements.

AndyAyersMS commented 2 years ago

Worked on this a bit.

Mechanically this isn't too bad. We are still trying to avoid creating new blocks. We potentially need to duplicate statements for some subset of preds but can hold costs down by reusing the current block. Looking at potential cost of duplication the bulk of the opportunities happen with duplication costs (as measured by GetCostSz, windows x64 SPMI) of 20 or less:

image

However, we need to figure out how to handle updating SSA. Since we lack a coherent SSA update story, it may be challenging.

If we duplicate uses, we may need to map them to the corresponding phi input ssa nums. That might be tractable, though IIRC we currently do not strongly associate PHI input operands with control flow. If we duplicate defs we may need new SSA numbers and the ability to update the appropriate subset of uses downstream. That seems difficult as we do not track SSA uses for each def.

Here's now things currently work. We insist the jump threaded block have just one side-effect free tree, and we eliminate that tree. So we do not introduce new defs. We do not try and fix the downstream uses. For instance int the AFTER picture below, X's IR should refer to t_0, J should be empty, and Y should refer to t_1.

BEFOREAFTER (current JT)

Furthermore, any downstream uses of t_2 that are reachable from X only should also be updated to t_0; any reachable from Y only should also be updated to t_1, and any reachable from both should inspire an appropriate phi (with subsequent uses likely updated, and possibly more phis, etc).

With this odd state of affairs, if a downstream phase walks the SSA graph it will see extra definitions in some places but will never miss seeing an actual definition. So we have argued (perhaps incorrectly) that this state of affairs is conservatively correct and the only downside of not updating SSA (and related things like VNs) is that downstream phases will miss out on follow-on opportunities.

But if jump threading starts duplicating SSA defs it gets harder to envision what a conservatively correct result looks like. Blind duplication doesn't work as range prop notices the multiple defs.

And if jump threading starts to duplicate SSA uses into X that are defined by a PHI in J, then the uses in X are nonsensical; these should be remapped to whatever use is available at the end of X (which as noted above we may not be able to figure out easily right now).

It may be possible to get some subset of cases by not allowing SSA defs or phis in block J, so all we end up doing is duplicating uses in X that are already available there, but this is likely way too restrictive. I'll take a look, but don't expect to find many cases like this.

To sum up: I'm not sure how to address these problems; seems like we need to tackle some kind of SSA update first.

AndyAyersMS commented 2 years ago

Given the above, I'm going to move this to future.