We will continue to improve the code quality for Arm64 targets in .NET 10 to benefit our customers who run or wants to run their workload on Arm64 hardware.
General optimizations
PAC/RET feature enablement
[ ] Cobalt 100 hardware has pointer-authentication extension and as part of security measure, we would like to add the support in .NET 10, both for the .NET runtime as well as JIT code. More details can be found in https://github.com/dotnet/runtime/issues/109457
Compact encoding
[ ] Improve code quality by making use of instructions that do more than one operation and hence improve the encoding of Arm64. Also, as part of this work, we will revisit the addressing modes that are ignored or used less frequently (e.g. post-index addressing mode) but can give much better code quality. https://github.com/dotnet/runtime/issues/68028
[ ] Modernize write barriers for Arm64: In various benchmarks, we have seen write barrier on arm64 is more time consuming that x86 counterpart. This is despite the fact that arm64 have conservative write-barrier (which does less work) instead of precise write barrier present in x86 (which does more work). The first step is to analyze the results from our experiments done in https://github.com/dotnet/runtime/issues/106051. Next step would be to see and enable precise write barrier for arm64. On x64, it showed significant wins in GC pause time and hence overall throughput. Another thing we want to explore is what happens when we have multiple versions of write-barrier similar to x86 and if we will give us any benefits.
The primary requirement before starting the design of streaming-mode SVE and SME would be to add support in JIT/.NET runtime for VL agnostic. This includes the following:
[ ] Introduce TYP_SIMD and educate various JIT code paths about the new type. See if some portion of this can be achievable on how we handle stackalloc.
[ ] Make sure getVectorTByteLength() returns VL that is available on the hardware and fix all the JIT code paths affected.
[ ] Sort locals such that TYP_SIMD / TYP_MASK are at the very last. They will be places at the bottom of the stack frame layout.
[ ] Access the stack offsets of TYP_SIMD / TYP_MASK using sve instructions
[ ] Enable non-streaming SVE for NativeAOT / crossgen with VL agnostic.
We will continue to improve the code quality for Arm64 targets in .NET 10 to benefit our customers who run or wants to run their workload on Arm64 hardware.
General optimizations
PAC/RET feature enablement
Compact encoding
Improvements in GC
Scalable Vector Extension
Wrap the non-streaming SVE work
Add support for vector length agnostic
The primary requirement before starting the design of streaming-mode SVE and SME would be to add support in JIT/.NET runtime for VL agnostic. This includes the following:
TYP_SIMD
and educate various JIT code paths about the new type. See if some portion of this can be achievable on how we handlestackalloc
.getVectorTByteLength()
returns VL that is available on the hardware and fix all the JIT code paths affected.TYP_SIMD
/TYP_MASK
are at the very last. They will be places at the bottom of the stack frame layout.TYP_SIMD
/TYP_MASK
using sve instructionsReference: https://github.com/dotnet/runtime/issues/101477
Design streaming mode SVE and SME
Stretch
References: