dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.32k stars 4.74k forks source link

[RyuJIT] Recognize and optimize constant set membership tests #8418

Open mikedn opened 7 years ago

mikedn commented 7 years ago

It would be useful if the JIT recognized tests like:

The second variant is sometimes implemented using a switch/jump table (in which case is much more simpler to recognize):

switch (c) {
case ' ':
case '\t':
case '\r':
case '\n':
    return true;
default:
    return false;
}

The reduction in number of conditional branches to 1 or 2 can make other optimizations more effective - if-conversion in particular.

Similar issue in the Roslyn repository: https://github.com/dotnet/roslyn/issues/20375

The first variant can be implemented by managed compilers but getting good code for the second is problematic if the target arch bitness is not known. Also, the BT instruction is not available in IL so it has to be simulated with shift/and/compare which the JIT has to recognize.

category:cq theme:basic-cq skill-level:intermediate cost:medium

mikedn commented 7 years ago

An experimental implementation shows a ~2700 bytes improvement for the first variant and a ~170 bytes improvement for the second variant. A proper implementation may do better but I don't expect it to be significantly better.

Example codegen for the first variant:

; before
       cmp      eax, 16
       jl       SHORT G_M38823_IG03
       cmp      eax, 18
       jle      SHORT G_M38823_IG05
; after
       lea      edx, [rax-16]
       cmp      edx, 2
       jbe      SHORT G_M38823_IG05

Example codegen for the second variant:

; before
       cmp      ecx, 0x1F41
       je       G_M36321_IG13
       cmp      ecx, 0x1F42
       je       G_M36321_IG13
       cmp      ecx, 0x1F49
       je       G_M36321_IG13
; after
       mov      eax, ecx
       sub      eax, 0x1F41
       cmp      eax, 8
       ja       G_M36321_IG05
       mov      r9d, 259
       bt       r9, rcx
       jb       G_M36321_IG14
BruceForstall commented 7 years ago

@dotnet/jit-contrib

mikedn commented 7 years ago

Switch to BT conversion yields a ~3700 bytes improvement. Example codegen:

; before
       cmp      eax, 3
       ja       SHORT G_M9431_IG07
       mov      eax, eax
       lea      rdx, [reloc @RWD00]
       mov      edx, dword ptr [rdx+4*rax]
       lea      rcx, G_M9431_IG02
       add      rdx, rcx
       jmp      rdx
; after
       cmp      eax, 3
       ja       SHORT G_M9431_IG07
       mov      edx, 3
       bt       edx, eax
       jb       SHORT G_M9431_IG05

It might be better but Roslyn seems to change rather quickly from switch table to binary search and that is much more difficult to recognize.

This transform is the cheapest of all in terms of compilation throughput since it's just a matter of looking at the number of switch arms.

mikedn commented 5 years ago

FWIW I've updated my old experiment (https://github.com/mikedn/coreclr/commit/991e2a038d1d6baa730ab7cc023c3912333c662e) and run the diffs again:

Total bytes of diff: -3905 (-0.01% of base)
    diff is an improvement.
Top file regressions by size (bytes):
          33 : System.Runtime.Numerics.dasm (0.05% of base)
           4 : System.IO.Compression.dasm (0.00% of base)
           4 : System.ObjectModel.dasm (0.02% of base)
           3 : System.IO.IsolatedStorage.dasm (0.02% of base)
           2 : System.Runtime.Serialization.Formatters.dasm (0.00% of base)
Top file improvements by size (bytes):
       -1437 : System.Private.CoreLib.dasm (-0.04% of base)
        -318 : System.Private.Xml.dasm (-0.01% of base)
        -248 : Newtonsoft.Json.dasm (-0.04% of base)
        -190 : System.Data.Common.dasm (-0.02% of base)
        -178 : System.Text.Encoding.CodePages.dasm (-0.25% of base)
66 total files with size differences (61 improved, 5 regressed), 63 unchanged.
Top method regressions by size (bytes):
          68 ( 1.84% of base) : System.Runtime.Numerics.dasm - Number:ParseNumber(byref,long,int,byref,ref,ref,bool):bool
          41 ( 4.20% of base) : System.Net.Primitives.dasm - IPv6AddressHelper:Parse(struct,struct,int,byref)
          41 ( 4.20% of base) : System.Private.Uri.dasm - IPv6AddressHelper:Parse(struct,struct,int,byref)
          19 (11.24% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - Scanner:ScanMultilineTrivia():struct:this
          19 ( 2.18% of base) : System.Private.CoreLib.dasm - Convert:TryFromBase64Chars(struct,struct,byref):bool
Top method improvements by size (bytes):
        -113 (-1.66% of base) : System.Private.CoreLib.dasm - Number:NumberToStringFormat(byref,byref,struct,ref)
         -62 (-12.40% of base) : System.Private.CoreLib.dasm - DateTime:TryCreate(int,int,int,int,int,int,int,byref):bool
         -60 (-1.54% of base) : System.Private.Xml.dasm - XmlSchemaInference:InferSimpleType(ref,byref):int
         -59 (-4.20% of base) : System.Security.AccessControl.dasm - GenericAce:CreateFromBinaryForm(ref,int):ref
         -51 (-2.17% of base) : System.Private.CoreLib.dasm - StringBuilder:AppendFormatHelper(ref,ref,struct):ref:this
Top method regressions by size (percentage):
           6 (15.38% of base) : Microsoft.CodeAnalysis.dasm - AssemblyIdentity:IsWhiteSpace(ushort):bool
           6 (15.38% of base) : System.Private.CoreLib.dasm - Convert:IsSpace(ushort):bool
           6 (15.38% of base) : System.Private.DataContractSerialization.dasm - XmlBaseWriter:IsWhitespace(ushort):bool:this
           6 (15.38% of base) : System.Private.DataContractSerialization.dasm - EncodingStreamWrapper:IsWhitespace(ubyte):bool
           6 (15.38% of base) : System.Private.DataContractSerialization.dasm - XmlJsonReader:IsWhitespace(ubyte):bool
Top method improvements by size (percentage):
         -49 (-53.85% of base) : System.Net.HttpListener.dasm - HttpListener:IsClientFault(int):bool
         -34 (-44.74% of base) : System.Data.Common.dasm - ExpressionNode:IsIntegerSql(int):bool
         -35 (-37.23% of base) : System.Private.Uri.dasm - Uri:IsBadFileSystemCharacter(ushort):bool:this
         -20 (-35.71% of base) : System.Data.Common.dasm - ExpressionNode:IsInteger(int):bool
         -20 (-35.71% of base) : System.Private.Xml.dasm - XmlEntity:IsValidChildType(int):bool:this
756 total methods with size differences (644 improved, 112 regressed), 145991 unchanged.

Most of it comes from recognizing if ('0' <= c && c <= '9'). Some 700 bytes of diff come from recognizing if (c == ' ' || c == '\t' || c == '\r' || c == '\n').

The experiment was mostly written to get an idea how many opportunities for such transforms exist. It has various problems:

@AndyAyersMS Any opinions about implementing such optimizations in the JIT?

AndyAyersMS commented 5 years ago

I think doing it relatively late makes sense.

Running it earlier might reduce time and space needed by some analyses, but it fires pretty rarely, and the transformation doesn't seem to unblock any higher-level opts.

cc @dotnet/jit-contrib

EgorBo commented 5 years ago

The experiment was mostly written to get an idea how many opportunities for such transforms exist.

btw, I wonder if it makes sense to add some popular libs and applications (trending C# repos, most popular nuget packages) to the jit-diffs. People who write code for BCL and corelib are usually familiar with jit internals, e.g. https://github.com/dotnet/corefx/blob/master/src/System.Numerics.Vectors/src/System/Numerics/Quaternion.cs#L171 - developer knew JIT wouldn't replace / 2 with * 0.5, etc.

BruceForstall commented 5 years ago

@EgorBo It's a good idea, and we have a long-term plan to do that, probably using PMI-via-SuperPMI.

MihaZupan commented 5 years ago

I went a bit crazy after reading through @EgorBo's blog post and wrote an analyzer-codefix for many common character check patterns. Do you think something like this would be a candidate for a Roslyn/JIT codegen optimization?

I went a bit crazy and did it for ranges up to 128 wide.

It recognizes all Func<char, bool> that match the following patterns:

(Or any mix of the above)

And switch statements with the following patterns:

public static bool IsAsciiPunctuation(this char c)
{
    switch (c)
    {
        case '!':
        // ...
        case '~':
            return true;
    }
    return false;
}
public static bool IsAsciiPunctuation(this char c)
{
    switch (c)
    {
        case '!':
        // ...
        case '~':
            return true;

        default:
            return false;
    }
}

128 bit wide checks end up looking like this

public static bool IsEmailUsernameSpecialChar(char c)
{
    /* return ".!#$%&'*+/=?^_`{|}~-+.~".IndexOf(c) >= 0; */

    int testValue = c;
    if (testValue > 126)
        return false;
    return testValue < 64 ? ((-6917268469155102720L >> testValue) & 1) != 0 : ((8646911292067545088L >> (testValue - 64)) & 1) != 0;
}

More examples from Markdig's character helper class: gist