dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
15.43k stars 4.76k forks source link

Enable multi-register intrinsics support for Arm64 #64921

Closed echesakov closed 6 months ago

echesakov commented 2 years ago

Overview

We achieved parity with x64 for Arm64 intrinsics support in .NET 5 for most of them except for multi-register intrinsics. We need more work to enable multi-register intrinsics for Arm64. The work is integral in that it involves changes in JIT, libraries and mono to enable working intrinsics.

Work Items

Follow-up (after the JIT work is completed)

Benchmarks to use

category:cq theme:register-allocator skill-level:expert cost:medium impact:medium

ghost commented 2 years ago

Tagging subscribers to this area: @JulieLeeMSFT See info in area-owners.md if you want to be subscribed.

Issue Details
# Overview We achieved parity with x64 for Arm64 intrinsics support in .NET 5 for most of them except for multi-register intrinsics. We need more work to enable multi-register intrinsics for Arm64. The work is integral in that it involves changes in JIT, libraries and mono to enable working intrinsics. ## Work Items - [ ] Enable multi-register intrinsics in the JIT - [ ] #39243 - [ ] Support register allocation for intrinsics returning value in a sequence of registers (e.g. `V0-V2` (note that this what different from the `LoadPairVector64/128` which returns result in two independent SIMD registers). - [ ] Propose and approve multi-register intrinsics APIs on Arm64 (these are expected to "look" similar to `LoadPairVector`/`StorePairVector` - i.e. using a `ValueTuple` of `Vector` values to express multi-register values) - [ ] Implement multi-register intrinsics on Arm64 (such as ones that will expose `LD[1-4]`,`ST[1-4]`,`TBL`,'TBX` instructions) ## Follow-up (after the JIT work is completed) - [ ] Libraries support to use the new intrinsics - [ ] monoVM support on the new intrinsics. ## Benchmarks to use - Microbenchmarks (for the libraries methods that will be intinsified with the new intrinsics)
Author: echesakovMSFT
Assignees: -
Labels: `arch-arm64`, `area-CodeGen-coreclr`, `User Story`
Milestone: -
EgorBo commented 2 years ago

will it fix div on x64 returning result and mod in two registers? 🙂

echesakov commented 2 years ago

will it fix div on x64 returning result and mod in two registers? 🙂

Not this work item, which I tend to think of as "LSRA allocating spans of registers" work.

However #64864 should lay foundation for DivRem (or MultiplyNoFlags2) intrinsics that returns pairs of values. #64857 is another relevant piece and needed to avoid unnecessary mov-s with multireg intrinsics.

huoyaoyuan commented 2 years ago

In #66551, I've actually made multi-reg DivRem on x64 works. @EgorBo you can check if there are works remaining.

JulieLeeMSFT commented 2 years ago

Moved to .NET 8.

kunalspathak commented 1 year ago

We will continue working on https://github.com/dotnet/runtime/issues/84510 in .NET 9.

kunalspathak commented 6 months ago

This is completed.