Open pgavlin opened 7 years ago
cc @JosephTremoulet @jkotas @dotnet/jit-contrib
Other optimizations in this bucket:
Make checked write barriers faster: The checked write barriers can use the same optimization with inlined immediates used for unchecked write barriers.
Bulk checked write barriers for larger structs: Single bulk write barrier call for the whole struct. The implemnentation ammortizes the bounds checks, etc. This optimization is done by NUTC in ProjectN: https://github.com/dotnet/corert/blob/e70ea6b9df34af48f9b6dc47a16d18cc430f6724/src/Native/Runtime/GCMemoryHelpers.cpp#L80
See dotnet/coreclr#13006 for an example where a ranged assign/check would be useful.
Also https://github.com/dotnet/coreclr/issues/22661 or initalizing structs in-place rather than init+assign(copy)
@stephentoub recently brought up a couple of scenarios in which the JIT selects suboptimal write barriers. In particular, the use of unnecessary checked write barriers was taking up nearly 10% of the cycles spent in a simple microbenchmark involving
CancellationToken
s.Opaque Byrefs
The JIT currently attempts to decide what sort of write barrier is necessary by pattern-matching on the target address. While this is sufficient for many cases, it is less than optimal when the target address is an opaque byref (e.g. when it is a byref-typed lclVar). We can do a better here by using value numbers when available; see e.g. the changes at https://github.com/dotnet/coreclr/compare/master...pgavlin:VNWriteBarrier. This can improve the performance of these stores by quite a bit as it allows the compiler to use unchecked write barriers in more cases. For example, the changes in the aforementioned branch allow the following code to use an unchecked write barrier rather than a checked write barrier:
Struct copies
As it stands, the JIT always uses
CORINFO_HELP_ASSIGN_BYREF
. This issue is a bit trickier to solve, as this helper has additional semantics beyond the write barrier: in particular, the destination and source addresses are passed inEDI
/RDI
andESI
/RSI
respectively and are incremented before the helper returns. This behavior (mostly) matches that of themovs
instruction. Though we could use the existing unchecked barrier, doing so would require us to adjust our current codegen strategy for struct copies, which just usesmovs
/rep movs
/ASSIGN_BYREF
and therefore only requires a maximum of three registers. Replacing this implementation with something that could generate calls to the existing unchecked barrier would require either additional registers (as this helper takes its destination and source arguments in the usual registers for the target ABI) or additional instructions (instead of usingmovsp
, we could manually increment the src and dest addresses and use arbitrary registers). The right solution is probably to implement a new helper--perhaps something likeCORINFO_HELP_MOVS_REF
--that provides unchecked barrier semantics but with the calling convention and post-increment behavior ofCORINFO_HELP_ASSIGN_BYREF
. This would improve code like the following:Thoughts?
category:cq theme:barriers skill-level:expert cost:medium