Exponential code explosion during inlining

Quuxplusone commented 3 years ago


Bugzilla Link	PR50485
Status	NEW
Importance	P normal
Reported by	Joseph Tremoulet (JCTremoulet@gmail.com)
Reported on	2021-05-26 08:06:50 -0700
Last modified on	2021-06-11 08:48:07 -0700
Version	trunk
Hardware	All All
CC	chandlerc@gmail.com, florian_hahn@apple.com, htmldeveloper@gmail.com, JCTremoulet@gmail.com, llvm-bugs@lists.llvm.org
Fixed by commit(s)
Attachments	`explode.ll` (12939 bytes, text/x-matlab)
Blocks
Blocked by
See also

Created attachment 24893 Repro showing pattern for exponential explosion, grows 1000x at this size.

Some of our code started failing to compile when change ed9df5bd2f50b changed the optimization pipeline. The attached repro case demonstrates the issue (seen via opt -passes 'default<O3>' or at https://godbolt.org/z/3vccKbxb5).

The optimizations done to clean up inlined code can result in code being smaller when considered as a transitive inline from a subsequent SCC than it was when considered from its own SCC. This makes the inlines look more favorable after their SCC has already converged, defeating the intent that the bottom-up ordering will ensure that prior SCCs are already flattened out to the point that inlining deeply through them will be undesirable.

Why commit ed9df5bd2f50b triggered this in our code is that the function sizes before and after inlining cleanup now happen to yield inline costs that straddle the threshold.

The attached repro case is structured as follows:

It contains an SCC comprising a series of tiers. Each tier is densely connected (one edge short of a clique), and has one edge to the next tier, with the last having an edge back to the first, completing the cycle.
Each individual function in each tier has just the right size and redundant code (a handful of noop stores) so that calls to it are not inlined when the large SCC is visited, but are inlined when it is revisited by transitive inlining of the next SCC.
The only other code is a singleton SCC (function @outside_scc) that calls into the large SCC.

Inlining in @outside_scc proceeds until the InlineHistory kicks in and cuts it off, which permits one inline of each acyclic path through the SCC, and the number of such paths is exponential in the number of tiers, hence the code explosion.

The original repro cases we've seen this in are actually two of the specint2k benchmarks, though of course this is using our custom toolchain, so I'd say it occurs in code that's real to us (we'll need a fix downstream regardless of what happens upstream).

Quuxplusone commented 3 years ago

Attached explode.ll (12939 bytes, text/x-matlab): Repro showing pattern for exponential explosion, grows 1000x at this size.

Quuxplusone commented 3 years ago

Some ideas for potential approaches to fixing the issue:

1 - During inlining, rather than adding every inlinee's callsites to the worklist of inline candidates, only do so when the inlinee is in the current SCC. This would seem to be in line (no pun intended) with the intent of the inliner, that calls left in already-converged SCCs are unprofitable inlines. But I'm not sure I'm interpreting that intent correctly.

2 - Arrange the optimization passes so that cleanup can't happen between one SCC and the next, so that the inline costs won't change between the time when a call is processed in its own SCC vs the next. I'm not sure how feasible that is.

3 - Adjust the InlineHistory logic to also have a cap on the length of CG paths it inlines. That would make the upper bound on code growth sensitive to the fan-out of the call graph, rather than its depth, which would help because fan-out, unlike depth, can't be arbitrarily increased without making the individual functions too big to inline.

Quuxplusone commented 3 years ago

cc @chandlerc, not sure how closely this may be related to the issue in PR33072.

cc @fhahn, since ed9df5bd2f50b ostensibly triggered this and I don't know if maybe there's a phase-ordering fix that would make sense.

Quuxplusone commented 3 years ago

Hongtao, I'm assigning this to you per your comment "I will work on a separate per-function capping mechanism" at https://lists.llvm.org/pipermail/llvm-dev/2021-June/151006.html, since we expect such a cap to resolve this issue.

Please reassign if I've misunderstood. Thanks.

Quuxplusone / LLVMBugzillaTest

Exponential code explosion during inlining #49454