dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.59k stars 4.55k forks source link

Intel architecture improvements for .NET 9 #93196

Open BruceForstall opened 9 months ago

BruceForstall commented 9 months ago

This issue describes planned improvements to Intel architecture (x86, x64) ISA support for .NET 9.

In .NET 8, AVX-512 ISA support was added (see https://github.com/dotnet/runtime/issues/77034). In .NET 9, this support will be further improved and leveraged for improved performance, especially with expanded libraries utilization of the recently implemented AVX-512 support. Investigations and implementation will start to support the newly announced AVX10.

Libraries work

Vector<T>

AVX10

AVX10 is a new set of vector ISA extensions, described here. We expect to begin preliminary work to support AVX10 in .NET 9, at least the parts that most directly map to the already supported AVX-512. An arch-avx10 GitHub label is defined to be added to all related PRs and issues: https://github.com/dotnet/runtime/labels/arch-avx10

RyuJIT feature work

RyuJIT optimization work

Debugging / diagnostics work (@BruceForstall)

API design work

JCC erratum

Future Work

Some of the planned work for .NET 9 have been pushed out to future work.

Libraries work

AVX10

RyuJIT feature work

ghost commented 9 months ago

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.

Issue Details
This issue describes planned improvements to Intel architecture (x86, x64) ISA support for .NET 9. In .NET 8, AVX-512 ISA support was added (see https://github.com/dotnet/runtime/issues/77034). In .NET 9, this support will be further improved and leveraged for improved performance, especially with expanded libraries utilization of the recently implemented AVX-512 support. Investigations and implementation will start to support the newly announced AVX10. ## Libraries work - [ ] Light up Utf8/Utf16 code with Vector512. https://github.com/dotnet/runtime/issues/86119 - [ ] Light up Ascii.Utility methods with Vector512 code paths. https://github.com/dotnet/runtime/issues/89280 - [ ] Light up BitArray with Vector512 - [ ] Light up String with Vector512 - [ ] Light up Base64 encode/decode with Vector512 - [ ] Consider SIMD JSON acceleration? - [ ] Consider XML API acceleration? - [ ] Consider SIMD/AVX optimization for Tensor (https://github.com/dotnet/runtime/issues/89639) ## RyuJIT feature work - [ ] Consider Vector expanding to Vector512, either automatically or opt-in. ## RyuJIT feature work - [ ] Add EVEX encoding opmask (k) register masking for per-instruction opmask to xarch emitter. https://github.com/dotnet/runtime/issues/80821 - [ ] Enable EVEX embedded rounding support in xarch emitter. https://github.com/dotnet/runtime/issues/93154 - [ ] Add optimization for scalar/vector conversion of uint32/uint64 to/from packed float/double. https://github.com/dotnet/runtime/issues/80829 - [ ] https://github.com/dotnet/runtime/issues/85207 ## RyuJIT optimization work - [ ] AVX512: Fold some bitwise operations to vpternlogq https://github.com/dotnet/runtime/issues/84534 - [ ] Add optimization for scalar conversion of float/double to ulong https://github.com/dotnet/runtime/issues/89279 ## AVX10 AVX10 is a new set of vector ISA extensions, described [here](https://www.intel.com/content/www/us/en/content-details/784267/intel-advanced-vector-extensions-10-intel-avx10-architecture-specification.html). We expect to begin preliminary work to support AVX10 in .NET 9, at least the parts that most directly map to the already supported AVX-512. - [ ] Add VM/JIT AVX10 awareness: CPUID enumeration and detection - [ ] Propose a new AVX10 API - [ ] Do JIT codegen implementation of the API - [ ] Enhance Vector256 codegen with AVX10 instructions (related to what has already been done for AVX512VL) - [ ] Allow additional 16 YMM registers for AVX10 - [ ] Allow embedded rounding for YMM/ZMM (related: https://github.com/dotnet/runtime/issues/93154) - [ ] Convert remaining AVX2 implementations to Vector256 - [ ] Allow AVX-512 optimizations for YMM (e.g., scalar conversion, vpternlog) ## CI/testing work ## Debugging / diagnostics work - [ ] https://github.com/dotnet/runtime/issues/87854 - [ ] Ensure ELT (enter/leave/tailcall hooks, for profiling) works. ## API design work - [ ] https://github.com/dotnet/runtime/issues/73604 - [ ] (Reconsider implementing?) https://github.com/dotnet/runtime/issues/74613 - [ ] https://github.com/dotnet/runtime/issues/76579
Author: BruceForstall
Assignees: -
Labels: `area-CodeGen-coreclr`, `User Story`
Milestone: 9.0.0
MichalPetryka commented 9 months ago

Is there maybe any interest in adding the workaround for the JCC erratum (#35730) in .Net 9? I've seen minor codegen improvements be reported as huge regressions because the code started to hit this issue.

BruceForstall commented 9 months ago

Is there maybe any interest in adding the workaround for the JCC erratum (#35730) in .Net 9? I've seen minor codegen improvements be reported as huge regressions because the code started to hit this issue.

@AndyAyersMS has expressed a desire to at least have a mode that could be used for performance testing to avoid the JCC erratum. Whether we could enable this always would depend on how uniform the improvements would be. It is expected there would be some code size regressions -- possibly significant -- due to the need to insert NOPs.

MichalPetryka commented 9 months ago

It is expected there would be some code size regressions -- possibly significant -- due to the need to insert NOPs.

Didn't we already accept that tradeoff with loop alignment?

BruceForstall commented 9 months ago

Didn't we already accept that tradeoff with loop alignment?

Yes, but this could be a very different magnitude of regression that will need to be measured.

BruceForstall commented 9 months ago

I went ahead and created https://github.com/dotnet/runtime/issues/93243 related to adding a JIT mode to avoid the JCC erratum, and linked it here.

Spacefish commented 9 months ago

I added Vector512 support for Min/Max of simple numeric datatypes in this PR: https://github.com/dotnet/runtime/pull/93369

huoyaoyuan commented 4 months ago

What about the upcoming APX extension? It looks like a major change of x86-64. I can see discussions around ABI for APX in GCC mail thread: https://gcc.gnu.org/pipermail/gcc/2023-July/242154.html https://gcc.gnu.org/pipermail/gcc-help/2023-August/142801.html

Maybe it's too early for .NET to adopt APX, but I'd like to see the estimated timeline. Should we wait for MSVC to define the calling convention?

tannergooding commented 4 months ago

We want to have hardware available on which it can run.

While Intel hasn't given an official timeline as of yet, such hardware is most likely not in the .NET 9 lifetime which ships in November 2024 and will be out of support around May 2026.

I expect this work will be done for .NET 10 which will likely ship around November 2025 (assuming we don't change our current pacing of releases) and be out of support November 2028.

MichalPetryka commented 4 months ago

We want to have hardware available on which it can run.

Would using Intel SDE not be enough for testing the support for it? It seems to already have support for emulating AVX10 and APX.

tannergooding commented 4 months ago

There's no point in scheduling work to be done for hardware that doesn't exist yet, particularly if that hardware is unlikely to exist within the lifetime of a release.

That is, we know that AVX10 is going to exist for Granite Rapids, as per the official announcement: https://www.intel.com/content/www/us/en/content-details/784267/intel-advanced-vector-extensions-10-intel-avx10-architecture-specification.html. The AVX10.1 work is correspondingly happening in .NET 9

While no official release date has been announced for APX, it is unlikely to happen in a timeframe that makes .NET 9 a good choice to target.

JulieLeeMSFT commented 3 weeks ago

Updated the Planned work with the current status. Marked completed work and moved items that will be pushed out to Future Work section.