For this scenario, MonoJIT takes the "safe" path in mini_emit_memcpy_internal (passes size / align > MAX_INLINE_COPIES) instead of using mini_emit_memcpy that handles with copy unrolling.
In comparison Unsafe.As<byte, Block64>(ref dest) = Unsafe.As<byte, Block64>(ref src); leads to:
In https://github.com/dotnet/perf-autofiling-issues/issues/33182, we discovered that MonoJIT generates two
memcpy
calls forFor this scenario, MonoJIT takes the "safe" path in
mini_emit_memcpy_internal
(passessize / align > MAX_INLINE_COPIES
) instead of usingmini_emit_memcpy
that handles with copy unrolling.In comparison
Unsafe.As<byte, Block64>(ref dest) = Unsafe.As<byte, Block64>(ref src);
leads to:This is causing serious regression on MonoJIT https://github.com/dotnet/perf-autofiling-issues/issues/33182 and more. Fixing this would bring over 400+ microbenchmark improvements (https://github.com/dotnet/perf-autofiling-issues/issues/41406#issuecomment-2358489014)