Open mikedn opened 7 years ago
An experimental implementation shows a ~2700 bytes improvement for the first variant and a ~170 bytes improvement for the second variant. A proper implementation may do better but I don't expect it to be significantly better.
Example codegen for the first variant:
; before
cmp eax, 16
jl SHORT G_M38823_IG03
cmp eax, 18
jle SHORT G_M38823_IG05
; after
lea edx, [rax-16]
cmp edx, 2
jbe SHORT G_M38823_IG05
Example codegen for the second variant:
; before
cmp ecx, 0x1F41
je G_M36321_IG13
cmp ecx, 0x1F42
je G_M36321_IG13
cmp ecx, 0x1F49
je G_M36321_IG13
; after
mov eax, ecx
sub eax, 0x1F41
cmp eax, 8
ja G_M36321_IG05
mov r9d, 259
bt r9, rcx
jb G_M36321_IG14
@dotnet/jit-contrib
Switch to BT conversion yields a ~3700 bytes improvement. Example codegen:
; before
cmp eax, 3
ja SHORT G_M9431_IG07
mov eax, eax
lea rdx, [reloc @RWD00]
mov edx, dword ptr [rdx+4*rax]
lea rcx, G_M9431_IG02
add rdx, rcx
jmp rdx
; after
cmp eax, 3
ja SHORT G_M9431_IG07
mov edx, 3
bt edx, eax
jb SHORT G_M9431_IG05
It might be better but Roslyn seems to change rather quickly from switch table to binary search and that is much more difficult to recognize.
This transform is the cheapest of all in terms of compilation throughput since it's just a matter of looking at the number of switch arms.
FWIW I've updated my old experiment (https://github.com/mikedn/coreclr/commit/991e2a038d1d6baa730ab7cc023c3912333c662e) and run the diffs again:
Total bytes of diff: -3905 (-0.01% of base)
diff is an improvement.
Top file regressions by size (bytes):
33 : System.Runtime.Numerics.dasm (0.05% of base)
4 : System.IO.Compression.dasm (0.00% of base)
4 : System.ObjectModel.dasm (0.02% of base)
3 : System.IO.IsolatedStorage.dasm (0.02% of base)
2 : System.Runtime.Serialization.Formatters.dasm (0.00% of base)
Top file improvements by size (bytes):
-1437 : System.Private.CoreLib.dasm (-0.04% of base)
-318 : System.Private.Xml.dasm (-0.01% of base)
-248 : Newtonsoft.Json.dasm (-0.04% of base)
-190 : System.Data.Common.dasm (-0.02% of base)
-178 : System.Text.Encoding.CodePages.dasm (-0.25% of base)
66 total files with size differences (61 improved, 5 regressed), 63 unchanged.
Top method regressions by size (bytes):
68 ( 1.84% of base) : System.Runtime.Numerics.dasm - Number:ParseNumber(byref,long,int,byref,ref,ref,bool):bool
41 ( 4.20% of base) : System.Net.Primitives.dasm - IPv6AddressHelper:Parse(struct,struct,int,byref)
41 ( 4.20% of base) : System.Private.Uri.dasm - IPv6AddressHelper:Parse(struct,struct,int,byref)
19 (11.24% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - Scanner:ScanMultilineTrivia():struct:this
19 ( 2.18% of base) : System.Private.CoreLib.dasm - Convert:TryFromBase64Chars(struct,struct,byref):bool
Top method improvements by size (bytes):
-113 (-1.66% of base) : System.Private.CoreLib.dasm - Number:NumberToStringFormat(byref,byref,struct,ref)
-62 (-12.40% of base) : System.Private.CoreLib.dasm - DateTime:TryCreate(int,int,int,int,int,int,int,byref):bool
-60 (-1.54% of base) : System.Private.Xml.dasm - XmlSchemaInference:InferSimpleType(ref,byref):int
-59 (-4.20% of base) : System.Security.AccessControl.dasm - GenericAce:CreateFromBinaryForm(ref,int):ref
-51 (-2.17% of base) : System.Private.CoreLib.dasm - StringBuilder:AppendFormatHelper(ref,ref,struct):ref:this
Top method regressions by size (percentage):
6 (15.38% of base) : Microsoft.CodeAnalysis.dasm - AssemblyIdentity:IsWhiteSpace(ushort):bool
6 (15.38% of base) : System.Private.CoreLib.dasm - Convert:IsSpace(ushort):bool
6 (15.38% of base) : System.Private.DataContractSerialization.dasm - XmlBaseWriter:IsWhitespace(ushort):bool:this
6 (15.38% of base) : System.Private.DataContractSerialization.dasm - EncodingStreamWrapper:IsWhitespace(ubyte):bool
6 (15.38% of base) : System.Private.DataContractSerialization.dasm - XmlJsonReader:IsWhitespace(ubyte):bool
Top method improvements by size (percentage):
-49 (-53.85% of base) : System.Net.HttpListener.dasm - HttpListener:IsClientFault(int):bool
-34 (-44.74% of base) : System.Data.Common.dasm - ExpressionNode:IsIntegerSql(int):bool
-35 (-37.23% of base) : System.Private.Uri.dasm - Uri:IsBadFileSystemCharacter(ushort):bool:this
-20 (-35.71% of base) : System.Data.Common.dasm - ExpressionNode:IsInteger(int):bool
-20 (-35.71% of base) : System.Private.Xml.dasm - XmlEntity:IsValidChildType(int):bool:this
756 total methods with size differences (644 improved, 112 regressed), 145991 unchanged.
Most of it comes from recognizing if ('0' <= c && c <= '9')
. Some 700 bytes of diff come from recognizing if (c == ' ' || c == '\t' || c == '\r' || c == '\n')
.
The experiment was mostly written to get an idea how many opportunities for such transforms exist. It has various problems:
bool b = ('0' <= c && c <= '9')
isn't recognized because the second operand of &&
is just c <= '9'
rather than c <= '9' ? true : false
. The current implementation only looks at compares involved in flow control.('0' <= c && c <= '9')
only works with constant bounds. Something like (0 <= c && c < x)
should also work.(c == ' ' || c == '\t' || c == '\r' || c == '\n')
generates its own range check (since the bitmask has to have at most 64 bits) while the code may already contain a suitable range check (e.g. (c < ' ' && (c == ' ' || c == '\t' || c == '\r' || c == '\n'))
)(c == ' ' || c == '\t' || ...)
also fails if the range exceeds 64 and the checks aren't suitably ordered. RSZ
and AND
in HIR.@AndyAyersMS Any opinions about implementing such optimizations in the JIT?
I think doing it relatively late makes sense.
Running it earlier might reduce time and space needed by some analyses, but it fires pretty rarely, and the transformation doesn't seem to unblock any higher-level opts.
cc @dotnet/jit-contrib
The experiment was mostly written to get an idea how many opportunities for such transforms exist.
btw, I wonder if it makes sense to add some popular libs and applications (trending C# repos, most popular nuget packages) to the jit-diffs. People who write code for BCL and corelib are usually familiar with jit internals, e.g. https://github.com/dotnet/corefx/blob/master/src/System.Numerics.Vectors/src/System/Numerics/Quaternion.cs#L171 - developer knew JIT wouldn't replace / 2
with * 0.5
, etc.
@EgorBo It's a good idea, and we have a long-term plan to do that, probably using PMI-via-SuperPMI.
I went a bit crazy after reading through @EgorBo's blog post and wrote an analyzer-codefix for many common character check patterns. Do you think something like this would be a candidate for a Roslyn/JIT codegen optimization?
I went a bit crazy and did it for ranges up to 128 wide.
It recognizes all Func<char, bool>
that match the following patterns:
return c == 'a' || c == 'b' ...
return (c >= 'A' && c <= Z') || (c >= 'a' && c <= 122) ...
, also \< and > and rotated(Or any mix of the above)
"abcdef".IndexOf(c) >= 0
or > -1
or rotated ...
public static bool IsEmailUsernameSpecialChar(char c)
{
return ".!#$%&'*+/=?^_`{|}~-+.~".IndexOf(c) >= 0;
}
And switch statements with the following patterns:
public static bool IsAsciiPunctuation(this char c)
{
switch (c)
{
case '!':
// ...
case '~':
return true;
}
return false;
}
public static bool IsAsciiPunctuation(this char c)
{
switch (c)
{
case '!':
// ...
case '~':
return true;
default:
return false;
}
}
128 bit wide checks end up looking like this
public static bool IsEmailUsernameSpecialChar(char c)
{
/* return ".!#$%&'*+/=?^_`{|}~-+.~".IndexOf(c) >= 0; */
int testValue = c;
if (testValue > 126)
return false;
return testValue < 64 ? ((-6917268469155102720L >> testValue) & 1) != 0 : ((8646911292067545088L >> (testValue - 64)) & 1) != 0;
}
It would be useful if the JIT recognized tests like:
if ('0' <= c && c <= '9')
- can be reduced to a single branch + compare + subtractif (c == ' ' || c == '\t' || c == '\r' || c == '\n')
- can be reduced to 2 branches + compare + bit test (BT instruction)The second variant is sometimes implemented using a switch/jump table (in which case is much more simpler to recognize):
The reduction in number of conditional branches to 1 or 2 can make other optimizations more effective - if-conversion in particular.
Similar issue in the Roslyn repository: https://github.com/dotnet/roslyn/issues/20375
The first variant can be implemented by managed compilers but getting good code for the second is problematic if the target arch bitness is not known. Also, the BT instruction is not available in IL so it has to be simulated with shift/and/compare which the JIT has to recognize.
category:cq theme:basic-cq skill-level:intermediate cost:medium