Open BruceForstall opened 1 year ago
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak See info in area-owners.md if you want to be subscribed.
Author: | BruceForstall |
---|---|
Assignees: | - |
Labels: | `area-CodeGen-coreclr`, `arch-avx512` |
Milestone: | 8.0.0 |
Assigning to @kunalspathak. Please feel free to reassign.
possible follow-up investigations and improvements
The LSRA TP improvements mentioned in https://github.com/dotnet/runtime/pull/83648#issuecomment-1482951574 and https://github.com/dotnet/runtime/pull/83648#issuecomment-1478819269 are for improving the for loop
over registers and is being done in https://github.com/dotnet/runtime/pull/85744. Other TP improvements need to happen in impImportBlockCode ()
for example which I am not sure will happen in .NET 8. Once #85744 is merged, I will move this to Future
.
The PR to enable EVEX support by default introduced some JIT throughput regressions. The comments in that PR analyzed the cause of these regressions and identified possible follow-up investigations and improvements.
This issue tracks recovering some of the TP regressions by investigating the proposed improvements or mitigations.
For example, LSRA has a number of places with the following loop structure:
and with AVX-512 available, there are an additional 16 SIMD registers and 8 opmask (k) registers, so these loops iterate more.