Open msedi opened 7 months ago
Tagging subscribers to this area: @dotnet/area-system-numerics See info in area-owners.md if you want to be subscribed.
This is definitely something for us to look more into, but the difference seen can largely be summed up as a quirk of how the benchmark was written and the results being heavily skewed by that.
Description
We are doing a lot of data processing with large arrays. In the .NET Framework we had to create our own library (ArrayMath) to prevent repetitive code. We started using pointer base arithmetics since in the beginning regular loops were too slow. Then we want over to Vector and Span which now brings us the best performance.
Since the introduction of TensorPrimitives I thought we can get rid of our own implementation and made a few benchmarks. I found that our Max implementation was really poor (TensorPrimitives is almost 4-5x faster) since we only implemented it with Vector128.
So I did further benchmarks with ArrayMath.MulAdd and TensorPrimitives.MultiplyAdd. For an images of512x512.
So this is currently the result:
Configuration
Regression?
There is currently no regression, but I think it would be worth checking this.
Data
Here's the benchmark.NET code
And this is how our ArrayMath.MulAdd looks like.
It seems that with tremendously less effort (no offense!) in our library we get faster results compared to the TensorPrimitives, which I find a bit sad that the efforts in TensorPrimitives do not pay out. Maybe there's a good explanation, but I couldn't find one.