Open huoyaoyuan opened 1 month ago
Tagging subscribers to this area: @dotnet/area-system-numerics-tensors See info in area-owners.md if you want to be subscribed.
@michaelgsharp / @tannergooding -- Can the two of you chat about this please and decide:
TensorPrimitives
and not Tensor<T>
?This is optimization, not correctness, and is a fairly involved change (especially with relevant perf testing).
Given that TensorPrimitives
is stable since .NET 8, I'd leave this as is and optimize it for .NET 10 instead.
I agree with Tanner that we should push the majority of this back to .NET 10. Its trivial to disable the vectorization for int cases, and that will give us a few wins, so we should do that part in .NET 9. That will still leave many cases running un-optimally, and those we should tackle in .NET 10.
Moving to 10, have put up #106288 to avoid vectorization for types that aren't float or double. Called out cases where a manual for
loop is likely to remain faster until .NET 10 as well (particularly for when the divisor is a constant).
TensorPrimitive
by default delegates simple operators to vector intrinsics. This is fine for most operations, but IDIV is an exception.First, most (if not all) ISAs lack support for IDIV in vector. I've checked AVX512/Avx2 and Sve/AdvSimd but don't find it. Thus our intrinsic vector will use software simulation. On my CPU with AVX2, it's about 2.5x slower comparing to naive for-loop on
int[1024] / int(scalar)
.When dividing with a common divisor, there is also the widely-used preinv algorithm to turn the division into cheaper multiplication, which is supported for vectorization on various ISAs.
I'm not sure if integer division is popular enough for this optimization. But we should at least disable
DivideOperator.Vectorizable
for integer types, because it ends up uses software simulation.