dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.41k stars 4.76k forks source link

Suboptimal codegen with `StartsWith(literal)` #110019

Open MihaZupan opened 1 day ago

MihaZupan commented 1 day ago

When switching from the BitConverter.ToUInt64 hack to the more readable StartsWith(literal), the codegen slightly regresses (doesn't matter for actual perf in this case).

private static readonly ulong s_http10Bytes = BitConverter.ToUInt64("HTTP/1.0"u8);
private static readonly ulong s_http11Bytes = BitConverter.ToUInt64("HTTP/1.1"u8);

private static int Parse1(ReadOnlySpan<byte> line)
{
    if (line.Length < 12 || line[8] != ' ') return 1;

    ulong first8Bytes = BitConverter.ToUInt64(line);
    if (first8Bytes == s_http11Bytes) return 2;
    if (first8Bytes == s_http10Bytes) return 3;

    return 4;
}

private static int Parse2(ReadOnlySpan<byte> line)
{
    if (line.Length < 12 || line[8] != ' ') return 1;

    if (line.StartsWith("HTTP/1.1"u8)) return 2;
    if (line.StartsWith("HTTP/1.0"u8)) return 3;

    return 4;
}

https://www.diffchecker.com/X4PhzKVx/

Patterns like

-cmp       rcx,rax
-jne       short M01_L00
+cmp       rcx,[rax]
+sete      al
+movzx     eax,al
+test      eax,eax
+je        short M01_L00
dotnet-policy-service[bot] commented 1 day ago

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.

jakobbotsch commented 19 hours ago

This probably needs some form of late forward substitution. Perhaps we should look into removing single-def single-use locals during rationalization (I have some code lying around to do this...).

jakobbotsch commented 19 hours ago

The problem is that we end up with IR like

x = (len == 4 ? SequenceEqual(a, b, 4) : SequenceEqual(a, b, len));
if (x != 0)
{
  passed:
  // do something
}

and later we are able to show that len == 4 is impossible, but the local sticks around.

It might be that we can teach qmark expansion to do some jump threading directly for cases like this, avoiding the intermediate local. Essentially qmark expansion would need to realize that the x != 0 is a last use of x and then instead of the above pseudocode produce IR corresponding to

if (SequenceEqual(a, b, 4) || SequenceEqual(a, b, len))
  goto passed;
...

It requires liveness, but we do have liveness available when qmark expansion runs. However making use of it might interfere with @AndyAyersMS's future plans to move qmark expansion backwards...

jakobbotsch commented 19 hours ago

Although the problem also happens with DOTNET_JitProfileValues=0, so the qmark thing does not explain it fully. In that case we end up (after inlining StartsWith) with something corresponding to the following:

bool x;
if ("HTTP/1.1"u8.Length > line.Length)
  x = false;
else
  x = SequenceEqual(a, b, c);
if (x)
{
  passed:
  // do something
}

RBO is able to prove that "HTTP/1.1"u8.Length > line.Length is false given the manual test made at the beginning of the function, but we still have that intermediate local around that results in this codegen.

We either need some smarter jump threading here (changing the x = SequenceEqual(a, b, c) to something like if (SequenceEqual(a, b, c)) goto passed;); or we need some form of late forward substitution.