Open stephentoub opened 5 months ago
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.
@dotnet/jit-contrib
The Math.Min
is effectively just:
public static int Min(int a, int b)
{
return (a <= b) ? a : b;
}
I am curious if we can recognize GT_SELECT
early in importer too , so then it's naturally CSE'd, etc. (and folded for constant input). Currently, GT_SELECT
is recognized only during late phases.
I don't think anything stops us from running if-conversion earlier. One strategy would be to canonicalize into GT_SELECT
form much more aggressively and then expand back to control flow later for cases where we think it's too expensive to compute the arms unconditionally. Probably the most practical solution here (vs something more general that teaches CSE to handle the hammocks in the flow graph).
@a74nh I'm curious if you have any thoughts about running if-conversion earlier. I recall you ended up moving the phase a few times when you added it.
Jump threading should be able to clean this sort of thing up too, but likely is currently blocked by side effects.
Jump threading should be able to clean this sort of thing up too, but likely is currently blocked by side effects.
Looks like RBO only optimizes just one condition:
(Before RBO - After RBO)
That looks like what we want, BB04 is now bypassed... is there something below there that doesn't get fixed?
Would jump threading be expected to handle the real world case that @stephentoub showed above, in regex? That one has more interposing stuff.
@a74nh I'm curious if you have any thoughts about running if-conversion earlier. I recall you ended up moving the phase a few times when you added it.
An earlier version of If Conversion did have it much earlier (before SSA I think). It was eventually moved so that it was directly after Optimize Bools, because Optimise Bools needs to combine multiple conditions within an if
into a single AND
chain before the If Conversion.
You could move them both earlier. I think as long as you've detected loops it should be ok. I'm not sure if it would interfere with some of the loop optimisations (hoisting, inverting etc).
That looks like what we want, BB04 is now bypassed... is there something below there that doesn't get fixed?
Egor you probably checked but indeed the lower compares are recognized as redundant, but the chain of adds blocks jump threading.
Dominator BB10 of BB13 has relop with same liberal VN
N003 ( 3, 3) [000064] J------N--- * LE int $100
N001 ( 1, 1) [000018] ----------- +--* LCL_VAR int V01 arg1 u:1 $80
N002 ( 1, 1) [000063] ----------- \--* CNS_INT int 42 $42
Redundant compare; current relop:
N003 ( 3, 3) [000074] J------N--- * LE int $100
N001 ( 1, 1) [000025] ----------- +--* LCL_VAR int V01 arg1 u:1 $80
N002 ( 1, 1) [000073] ----------- \--* CNS_INT int 42 $42
Both successors of idom BB10 reach BB13 -- attempting jump threading
BB13 has side effects; no threading
Either we need to be willing to destroy SSA or we need some way to update it by creating duplicate defs.
Before:
After: we need to copy the PHI/ADD code in BB13 into BB11 and BB12. The PHI resolves to either V9.1 or V9.2 depending on pred, but we now need two defs for V6, and we need to add a PHI for V6 down in BB16, and update that use to the PHI def.
Currently we know that V6.4 has one use, but we don't know where it is, and we don't know all the places we might need to add phis.
We could I suppose have RBO just build up a list of blocks where jump threading was blocked like this, and remember how to reroute each pred, and then at the end of the optimizer phases, run through that list... provided the flow structure still matches we could then just duplicate IR and rewire flow. It might be a bit fragile though.
Or if we moved RBO to the end like we've been envisioning if/when we can pay for opt repeat, it can just do this as a second action.
I was expecting multiple
Math.Min
calls with the same Int32 arguments to be elided via CSE, but that's not happening. SharpLabresulting in:
It's not a big deal, but the actual case where I noticed this came from the regex source generator, where it was emitting code like this:
in order to search a span but only up to a specific length; if it doesn't find what it's looking for within that length, it needs to update the position to the end of what it did examine. It's doing so with multiple
Math.Min(span.Length, int32limit)
calls, and each is resulting in the relevant comparisons rather than being CSE'd, even though the inputs provably don't change between calls.