[RyuJIT] Range check phase effectiveness

mikedn commented 7 years ago

It seems to me that RangeCheck tries too hard and sometimes fails too easily. And we also seem to have cases where range checks reach this phase (and they are eliminated) but they could have been eliminated earlier at a lower cost.

I'm taking a look and I'll use this issue to collect my observations.

category:cq theme:range-checks skill-level:expert cost:large

mikedn commented 7 years ago

Simple example of trying and failing:

static int Test(int i, int[] a) {
    if (i < 0 || i >= a.Length)
        return 0;
    return a[i];
}

It seems that RangeCheck should be capable of removing the unnecessary range check in the above example because it's capable of merging assertions generated by assertion propagation:

[RangeCheck::GetRange] BB04N001 (  1,  1) [000020] ------------             *  LCL_VAR   int    V00 arg0         u:2 $80
{
Merging assertions from pred edges of BB04 for op(000001AE2C3EB200) $080
Constant Assertion: (256, 64) ($100,$40) Loop_Bnd {LT($80, $40)} is  {IntCns 0} index=#02, mask=0000000000000002
Constant Assertion: (256, 64) ($100,$40) Loop_Bnd {LT($80, $40)} is  {IntCns 0} index=#02, mask=0000000000000002
The range after edge merging:<0, Unknown>
Constant Assertion: (257, 64) ($101,$40) Loop_Bnd { {InitVal($40)} LT  {ARR_LENGTH($c0)}} is not  {IntCns 0} index=#04, mask=0000000000000008
Constant Assertion: (257, 64) ($101,$40) Loop_Bnd { {InitVal($40)} LT  {ARR_LENGTH($c0)}} is not  {IntCns 0} index=#04, mask=0000000000000008
The range after edge merging:<0, VN0082 + -1>
   Computed Range (2C3EB200) => <0, VN0082 + -1>
}

It correctly figures out that the range is [0, a.Length) yet it fails to remove the range check because:

Does overflow 000001AE2C3EB200?
Method determined to overflow.

What overflow?! It's obvious that i isn't the result of an operation that can overflow but RangeCheck tries to determine that anyway and fails. At least when i is an argument (VNF_InitVal) it should conclude that it doesn't overflow, just like it already does for constants.

I need to look into this more but it seems that for certain types of ranges checking for overflow (and the subsequent monotonoy check) is simply not necessary, no matter how the value is produced.

pgavlin commented 7 years ago

cc @dotnet/jit-contrib @russellhadley

mikedn commented 7 years ago

Disabling RangeCheck results in the following regression:

Total bytes of diff: 23715 (0.18% of base)
    diff is a regression.
Total byte diff includes 0 bytes from reconciling methods
        Base had    0 unique methods,        0 unique bytes
        Diff had    0 unique methods,        0 unique bytes
Top file regressions by size (bytes):
        6951 : System.Private.CoreLib.dasm (0.21% of base)
        2224 : Microsoft.CodeAnalysis.VisualBasic.dasm (0.10% of base)
        2193 : Microsoft.CodeAnalysis.CSharp.dasm (0.10% of base)
        2187 : Microsoft.CodeAnalysis.dasm (0.29% of base)
        2074 : Microsoft.CSharp.dasm (0.56% of base)
40 total files with size differences (0 improved, 40 regressed), 39 unchanged.
Top method regessions by size (bytes):
        1102 : Microsoft.CSharp.dasm - PredefinedMembers:.cctor()
         938 : System.Text.RegularExpressions.dasm - RegexCharClass:.cctor()
         710 : Microsoft.CodeAnalysis.dasm - AttributeDescription:.cctor()
         675 : Microsoft.CodeAnalysis.CSharp.dasm - BuiltInOperators:GetSimpleBuiltInOperators(int,ref):this (2 methods)
         671 : System.Private.CoreLib.dasm - Dictionary`2:OnDeserialization(ref):this (29 methods)
789 total methods with size differences (0 improved, 789 regressed), 64772 unchanged.

Hard to say if a 23715 bytes regression is large or not compared to the amount of work RangeCheck does. It would be nice to get a percentage of eliminated range checks but for code size should do.

Now I need to check how many of these missed range checks can be eliminated earlier.

mikedn commented 7 years ago

One set of range checks that RangeCheck come from impInitializeArrayIntrinsic. For whatever reasons it uses GT_INDEX to obtain the address of the first array element - https://github.com/dotnet/coreclr/blob/master/src/jit/importer.cpp#L3270.

This seems like overkill. GT_INDEX will end up being morphed (which generates a bunch of IR) and then the range check needs to be eliminated. There's no need for range check to begin with since the transform is done only when the array length is known and it's large enough to hold the initialization data.

If the code is changed to obtain the element address by simply adding the array data offset then the above diff changes to:

Total bytes of diff: 18947 (0.15% of base)
    diff is a regression.
Total byte diff includes 0 bytes from reconciling methods
        Base had    0 unique methods,        0 unique bytes
        Diff had    0 unique methods,        0 unique bytes
Top file regressions by size (bytes):
        6223 : System.Private.CoreLib.dasm (0.19% of base)
        1851 : Microsoft.CodeAnalysis.CSharp.dasm (0.09% of base)
        1752 : Microsoft.CodeAnalysis.VisualBasic.dasm (0.08% of base)
        1389 : Microsoft.CodeAnalysis.dasm (0.18% of base)
         981 : System.Text.RegularExpressions.dasm (1.03% of base)
38 total files with size differences (0 improved, 38 regressed), 41 unchanged.
Top method regessions by size (bytes):
         938 : System.Text.RegularExpressions.dasm - RegexCharClass:.cctor()
         671 : System.Private.CoreLib.dasm - Dictionary`2:OnDeserialization(ref):this (29 methods)
         540 : System.Reflection.Metadata.dasm - MetadataReader:InitializeProjectedTypes()
         510 : System.Reflection.DispatchProxy.dasm - ProxyBuilder:.cctor()
         435 : Microsoft.CodeAnalysis.CSharp.dasm - BuiltInOperators:GetSimpleBuiltInOperators(int,ref):this (2 methods)
Top method improvements by size (bytes):
         -47 : Microsoft.CodeAnalysis.dasm - DesktopAssemblyIdentityComparer:.cctor()
         -43 : System.Security.Principal.Windows.dasm - WindowsIdentity:.ctor(ref):this (2 methods)
703 total methods with size differences (2 improved, 701 regressed), 64858 unchanged.

So ~20% range checks eliminated by RangeCheck should not exist in the first place.

mikedn commented 7 years ago

Another set of range checks comes from earlyprop, it propagates constant array lengths and when the array index is also constant we end up with range checks like GT_ARR_BOUNDS_CHECK(2, 3). Nothing until RangeCheck removes these useless checks, assertionprop even generates pointless assertions out of them.

Maybe these should be eliminated during earlyprop but for the sake of experimentation I changed assertionprop to get rid of them, it's easier.

Now the diff is:

Total bytes of diff: 15735 (0.12% of base)
    diff is a regression.
Total byte diff includes 0 bytes from reconciling methods
        Base had    0 unique methods,        0 unique bytes
        Diff had    0 unique methods,        0 unique bytes
Top file regressions by size (bytes):
        3125 : System.Private.CoreLib.dasm (0.10% of base)
        1795 : Microsoft.CodeAnalysis.CSharp.dasm (0.08% of base)
        1752 : Microsoft.CodeAnalysis.VisualBasic.dasm (0.08% of base)
        1357 : Microsoft.CodeAnalysis.dasm (0.18% of base)
         981 : System.Text.RegularExpressions.dasm (1.03% of base)
38 total files with size differences (0 improved, 38 regressed), 41 unchanged.
Top method regessions by size (bytes):
         938 : System.Text.RegularExpressions.dasm - RegexCharClass:.cctor()
         671 : System.Private.CoreLib.dasm - Dictionary`2:OnDeserialization(ref):this (29 methods)
         540 : System.Reflection.Metadata.dasm - MetadataReader:InitializeProjectedTypes()
         510 : System.Reflection.DispatchProxy.dasm - ProxyBuilder:.cctor()
         435 : Microsoft.CodeAnalysis.CSharp.dasm - BuiltInOperators:GetSimpleBuiltInOperators(int,ref):this (2 methods)
Top method improvements by size (bytes):
         -52 : System.Private.CoreLib.dasm - UnicodeEncoding:GetChars(long,int,long,int,ref):int:this
         -51 : System.Private.CoreLib.dasm - UnicodeEncoding:GetCharCount(long,int,ref):int:this
         -47 : Microsoft.CodeAnalysis.dasm - DesktopAssemblyIdentityComparer:.cctor()
         -43 : System.Security.Principal.Windows.dasm - WindowsIdentity:.ctor(ref):this (2 methods)
         -28 : System.Private.CoreLib.dasm - UTF32Encoding:GetChars(long,int,long,int,ref):int:this
665 total methods with size differences (10 improved, 655 regressed), 64896 unchanged.

Now the number of unnecessary range checks reaches ~33%.

omariom commented 7 years ago

Hard to say if a 23715 bytes regression is large or not compared to the amount of work RangeCheck does

Are there perf tests for that?

mikedn commented 7 years ago

Are there perf tests for that?

We'll see. I still have ~66% range checks to look at to tell what exactly RangeCheck is really good at :)

mikedn commented 7 years ago

Another potentially interesting failure:

int i = ...;       
while ((uint)i < (uint)a.Length)
    i = a[i];

The loop is transformed into a do/while loop and 2 OAK_NO_THROW assertions are generated. Assertion prop doesn't merge them at PHI and RangeCheck ignores them completely so the unnecessary range check is not eliminated.

While not exactly common this pattern can be useful when you have a linked list implemented using array indices rather than objects references (e.g. in Dictionary<K, V>). In such cases the end of the list is usually signaled by a -1 index and instead of checking for -1 one can use an unsigned check like above.

mikedn commented 7 years ago

Another type of range checks that RangeCheck eliminates look like the following:

obj.field = new byte[2];
obj.field[0] = 4;
obj.field[1] = 2;

Early prop runs very early (before VN and CSE) and has no way to propagate the array length so it's up to RangeCheck to figure it out.

Maybe assertion prop could extract as OAK_NO_THROW assertion from the array creation? Even if it can it's hard to tell if moving work from range check to assertion prop is useful.

mikedn commented 7 years ago

Fun case:

var a = new int[x];
for (int i = 0; i < a.Length; i++)
    s += a[i];

Range check eliminated by loop cloning - after deriving loop cloning conditions it figures out that they're always true and eliminates the range check without actually cloning the loop. Good.

Same code but with byte[] instead of int[] - loop cloning fails and RangeCheck gets to eliminate the range check. Loop cloning fails because optExtractArrIndex looks for a LSH(index, constant) but there's none for byte arrays.

mikedn commented 7 years ago

So aside special cases like byte[] loop cloning takes care of the typical case:

for (int i = 0; i < a.Length; i++)
    s += a[i];

when a is a local variable. RangeCheck helps again with the case when the array is field:

for (int i = 0; i < obj.field.Length; i++)
    s += obj.field[i];

where loop cloning can't do anything because it runs too early. Which is kind of strange, why increase the IR size before running other optimizations?

mikedn commented 7 years ago

Some potentially interesting RangeCheck stats (from corelib crossgen):

Name	Value
Range checks	5687
Removed range checks	295
Range - unknown	3572
Range - overflow	1036
Range widen - unknown	4
Range widen - out of bounds	780
Max search depth	54 (out of 100)
Max used budget	1481 (out of 8192)
Total time	36ms (out of 6860ms)
Map def time	15ms
Total memory	9.2 Mbytes
Max memory per method	268 Kbytes

Numbers do not include range checks that involve constants and do no require range computation.

The percentage of removed range checks is pretty low but that's not surprising considering that range checks are also removed in previous phases (earlyprop, assertionprop, loop cloning).

Both the search depth and budget are well below hard coded limits but the search depth is surprisingly deep.

dotnet / runtime

[RyuJIT] Range check phase effectiveness #8488