Open neon-sunset opened 2 months ago
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch See info in area-owners.md if you want to be subscribed.
I suspect we have a poor test coverage for AVX1-only hardware (that will fail on AVX2 instructions)
Ah, sorry, I though it emits invalid instructions, that's why I put 9.0.0 milestone.
Description
It appears that some operations, like
Vector256.Multiply
, produce bad codegen on systems where only AVX is available without full AVX2 support despite the underlying ISA supporting full-width multiplication on lanes ofdouble
andfloat
types.Reproduction Steps
dotnet new console --aot
template:unsafe { // Simulate work var v = stackalloc Vector256[5];
var r = stackalloc Vector256[5];
Test(v, r);
}
static unsafe void Test( Vector256 v,
Vector256 r
) {
for (var k = 0; k < 5; k++)
r[k] = Vector256.Multiply(v[k], v[k]);
}
Actual behavior
Program:<<Main>$>g__Test|0_0(ulong,ulong)
compiles toRegression?
No
Known Workarounds
Replace
Vector256.*
method calls (or operators) on paths that usedouble
orfloat
with respectiveAvx.*
alternatives.This is, however, a significant performance trap the users may not be aware of until the code is executed on or targeted at the system with this specific ISA support flags configuration.
The resulting regressions on more complicated code can result in worse performance than straight-up disabling AVX instead.
Ideally, .NET should not have this kind of rough edges around Vector APIs.
Thanks!
Configuration
Other information
No response