Improve write barrier codegen for struct copies and indirect stores through opaque byrefs.

pgavlin commented 7 years ago

@stephentoub recently brought up a couple of scenarios in which the JIT selects suboptimal write barriers. In particular, the use of unnecessary checked write barriers was taking up nearly 10% of the cycles spent in a simple microbenchmark involving CancellationTokens.

Opaque Byrefs

The JIT currently attempts to decide what sort of write barrier is necessary by pattern-matching on the target address. While this is sufficient for many cases, it is less than optimal when the target address is an opaque byref (e.g. when it is a byref-typed lclVar). We can do a better here by using value numbers when available; see e.g. the changes at https://github.com/dotnet/coreclr/compare/master...pgavlin:VNWriteBarrier. This can improve the performance of these stores by quite a bit as it allows the compiler to use unchecked write barriers in more cases. For example, the changes in the aforementioned branch allow the following code to use an unchecked write barrier rather than a checked write barrier:

struct S
{
    public C c;
}

sealed class C
{
    static int Main()
    {
        var s = new S[1];
        s[0] = new S { c = new C() };
        return 100;
    }
}

Struct copies

As it stands, the JIT always uses CORINFO_HELP_ASSIGN_BYREF. This issue is a bit trickier to solve, as this helper has additional semantics beyond the write barrier: in particular, the destination and source addresses are passed in EDI/RDI and ESI/RSI respectively and are incremented before the helper returns. This behavior (mostly) matches that of the movs instruction. Though we could use the existing unchecked barrier, doing so would require us to adjust our current codegen strategy for struct copies, which just uses movs/rep movs/ASSIGN_BYREF and therefore only requires a maximum of three registers. Replacing this implementation with something that could generate calls to the existing unchecked barrier would require either additional registers (as this helper takes its destination and source arguments in the usual registers for the target ABI) or additional instructions (instead of using movsp, we could manually increment the src and dest addresses and use arbitrary registers). The right solution is probably to implement a new helper--perhaps something like CORINFO_HELP_MOVS_REF--that provides unchecked barrier semantics but with the calling convention and post-increment behavior of CORINFO_HELP_ASSIGN_BYREF. This would improve code like the following:

using System.Runtime.CompilerServices;

struct S
{
    public C c;
    public int i;
}

sealed class C
{
    [MethodImpl(MethodImplOptions.NoInlining)]
    static void Store(S[] array, int slot, S s)
    {
        array[slot] = s;
    }

    static int Main()
    {
        Store(new S[1], 0, new S());
        return 100;
    }
}

Thoughts?

category:cq theme:barriers skill-level:expert cost:medium

pgavlin commented 7 years ago

cc @JosephTremoulet @jkotas @dotnet/jit-contrib

jkotas commented 7 years ago

Other optimizations in this bucket:

Make checked write barriers faster: The checked write barriers can use the same optimization with inlined immediates used for unchecked write barriers.
Bulk checked write barriers for larger structs: Single bulk write barrier call for the whole struct. The implemnentation ammortizes the bounds checks, etc. This optimization is done by NUTC in ProjectN: https://github.com/dotnet/corert/blob/e70ea6b9df34af48f9b6dc47a16d18cc430f6724/src/Native/Runtime/GCMemoryHelpers.cpp#L80

AndyAyersMS commented 7 years ago

See dotnet/coreclr#13006 for an example where a ranged assign/check would be useful.

benaadams commented 5 years ago

Also https://github.com/dotnet/coreclr/issues/22661 or initalizing structs in-place rather than init+assign(copy)

dotnet / runtime

Improve write barrier codegen for struct copies and indirect stores through opaque byrefs. #8547

Opaque Byrefs

Struct copies