dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.96k stars 4.65k forks source link

JIT: int64 comparisons can be optimized when both arguments are known to fit into int32 #11394

Open GrabYourPitchforks opened 5 years ago

GrabYourPitchforks commented 5 years ago

As of https://github.com/dotnet/coreclr/pull/20771, calling someString.Slice(3, 4) results in the following codegen (x64):

; assume rdx is a reference to a non-null System.String
mov eax, dword ptr [rdx + 8] ; eax := someString.Length
mov eax, eax ; zero-extend to 64-bit
cmp rax, 7 ; if ((ulong)(uint)start + (ulong)(uint)length > (ulong)(uint)_length) { /* THROW */ }, where start and length are consts
jb <THROW_OUT_OF_RANGE>

Ideally if start and length are known constant non-negative integers, this could be further optimized by having the JIT recognize that their sum must also be able to fit within a 32-bit integer without overflow and performing the comparison directly between the memory address and an imm32:

; assume rdx is a reference to a non-null System.String
cmp dword ptr [rdx + 8], 7 ; if ((uint)7 > (uint)_length) { /* THROW */ }
jb <THROW_OUT_OF_RANGE>

Even if the inputs aren't constant, if the JIT can reason that they're both non-negative (perhaps because their sign has been checked previously in the current frame), it should ideally still be able to reason that overflow cannot occur and that the comparison can proceed using 32-bit instructions.

lea temp, [start + length] ; 32-bit lea
cmp dword ptr [rdx + 8], temp ; 32-bit compare
jb <THROW_OUT_OF_RANGE>

category:cq theme:optimization skill-level:expert cost:medium

ahsonkhan commented 5 years ago

cc @dotnet/jit-contrib

GrabYourPitchforks commented 5 years ago

Another example of a case that can be optimized is when start is zero. Then the 32-to-64-bit extension doesn't have to take place at all, and the ideal codegen would then be:

cmp dword ptr [rdx + 8], length ; 32-bit compare
jb <THROW_OUT_OF_RANGE>
mikedn commented 5 years ago

Ideally if start and length are known constant non-negative integers, this could be further optimized by having the JIT recognize that their sum must also be able to fit within a 32-bit integer without overflow and performing the comparison directly between the memory address and an imm32:

This can probably be done in compare lowering where something similar is already done for byte to int casts, the cast is removed and a byte compare is generated. And at least in this specific case the transform is xarch specific anyway, on arm you still need to load the length in a register.

mov eax, eax ; zero-extend to 64-bit

This should go away with dotnet/coreclr#12676

if the JIT can reason that they're both non-negative (perhaps because their sign has been checked previously in the current frame)

That's probably quite far away. The JIT isn't exactly good at tracking properties such as "this value is known to be positive"...

AndyAyersMS commented 5 years ago

Not likely we'll get to this in 3.0.