AVX-512 throughput improvement opportunties

BruceForstall commented 1 year ago

The PR to enable EVEX support by default introduced some JIT throughput regressions. The comments in that PR analyzed the cause of these regressions and identified possible follow-up investigations and improvements.

This issue tracks recovering some of the TP regressions by investigating the proposed improvements or mitigations.

For example, LSRA has a number of places with the following loop structure:

for (regNumber reg = REG_FIRST; reg < AVAILABLE_REG_COUNT; reg = REG_NEXT(reg))

and with AVX-512 available, there are an additional 16 SIMD registers and 8 opmask (k) registers, so these loops iterate more.

ghost commented 1 year ago

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak See info in area-owners.md if you want to be subscribed.

Issue Details

The PR to enable EVEX support by default introduced some JIT throughput regressions. The comments in that PR analyzed the cause of these regressions and identified possible follow-up investigations and improvements. This issue tracks recovering some of the TP regressions by investigating the proposed improvements or mitigations. For example, LSRA has a number of places with the following loop structure: ``` for (regNumber reg = REG_FIRST; reg < AVAILABLE_REG_COUNT; reg = REG_NEXT(reg)) ``` and with AVX-512 available, there are an additional 16 SIMD registers and 8 opmask (k) registers, so these loops iterate more.

Author:	BruceForstall
Assignees:	-
Labels:	`area-CodeGen-coreclr`, `arch-avx512`
Milestone:	8.0.0

BruceForstall commented 1 year ago

Link: https://github.com/dotnet/runtime/pull/83648

kunalspathak commented 1 year ago

JulieLeeMSFT commented 1 year ago

Assigning to @kunalspathak. Please feel free to reassign.

kunalspathak commented 1 year ago

possible follow-up investigations and improvements

The LSRA TP improvements mentioned in https://github.com/dotnet/runtime/pull/83648#issuecomment-1482951574 and https://github.com/dotnet/runtime/pull/83648#issuecomment-1478819269 are for improving the for loop over registers and is being done in https://github.com/dotnet/runtime/pull/85744. Other TP improvements need to happen in impImportBlockCode () for example which I am not sure will happen in .NET 8. Once #85744 is merged, I will move this to Future.

dotnet / runtime

AVX-512 throughput improvement opportunties #83946