Open Quuxplusone opened 6 years ago
In terms of the LLVM source code itself, we should probably add something analogous to Value::hasNUses().
In terms of optimizing code like this, we might be able to figure it out. If a loop has no side-effects (memory writes/atomics/etc.), and the only value live out of the loop is an induction variable, and the only use of that induction variable is an equality comparison, and the induction variable can't wrap, we can transform the loop to exit early.
In some ways I'd love to see this just fixed in the optimizers without needing to change LLVM's source (you know what I mean - yes, changing the optimizer is changing the source... but not changing all the call sites that query the length of linked lists). Just from a purist sort of perspective that'd be "nice".
That said, yeah, querying for a specific linked list size (or greater than, or less than, etc) seems like something APIs should support, and I imagine at some point maybe someone'll propose it for std::{forward_}list & having similar APIs for our lists is probably quite reasonable rather than relying on optimization heroics.
(& I don't really have the skills/domain knowledge to really work on fixing this)
Eli/others - do you happen to know any particular optimizations that might be doing any parts of this work that could/should be enhanced to do the rest? Or in what optimization pass (if there is an existing one) this could/should live in generally?
Some kind of loop unrolling perhaps? Well I guess it doesn't have to be unrolled at all, it could involve the comparison being moved into the loop condition. (guess that's an optimization tradeoff though - if a list was always of size 2 or 3, comparing on every iteration might slow things down (the comparison at 1 would always fail, for instance)?)
Probably makes sense as a separate loop optimization pass; none of our existing loop passes change the trip count, or do any similar analysis of induction variables. We have some existing code in SCEV for figuring out whether a loop has side-effects, and checking the live-out values is trivial in LCSSA form, so it's probably not that hard to write.
Probably the transform itself should move the comparison into the loop, and then we can unroll it afterwards if it makes sense. Even if we always want to unroll, it's simpler to rewrite the latch, as opposed to trying to modify the unroller. I haven't really considered what the heuristics should look like, though.
In llvm, there's plenty of hot code that looks like this:
Here, pred_size takes linear time, but there's never a need to visit more than two predecessors. In a stage2 Release build of llc, we do 45 million unnecessary linked list iterations just at this one call site [1].
Is there a way for the optimizer to recognize that no more than two iterations are needed?
Reduced example (https://godbolt.org/z/TOtQD6):
Currently we produce:
Could we somehow make the loop exit early if the sum ever exceeds 2? Like:
[1] I defined 'wasted iterations' as
pred_size(BB) - 2
, whenever that difference would be positive. I did the stage2 build with r347179.