Open Smaug123 opened 1 year ago
Come to think of it, maybe we could literally just call System.Linq.Max
from Array.max
.
Many vectorization intrinsics are not available in netstandard2.0
or even 2.1
.
Many vectorization intrinsics are not available in
netstandard2.0
or even2.1
.
Yep, this would have to be conditionally compiled in on the appropriate frameworks, I imagine.
One difficulty is that Vector<'a> requires 'a : (new : unit -> 'a) and 'a : struct and 'a :> System.ValueType. I do not know how to make this change in a way that doesn't break the API, because I don't know how to provide one implementation for the 'a which satisfy those requirements and one for 'a which do not. But it would be great if we could simply provide various implementations of Array.max and have the correct one be selected.
We relaxed this in .NET 8 and it no longer requires any constraint. This allows generic code to simply do the equivalent of if (Vector<T>.IsSupported && Vector.IsHardwareAccelerated) { ... }
That being said, we do indeed accelerate System.Linq.Max
already. There are also more considerations than System.Numerics.Vector<T>
and you can achieve greater performance by taking advantage of the newer System.Runtime.Intrinsics
APIs. We are likewise working on providing more functions for performance vectorizable functions over spans of data. This is being done via System.Numerics.Tensors.TensorPrimitives
, which is out of band and only supports float
in .NET 8, but which will expand to the full set of T
that Vector<T>
supports into .NET 9.
-- Noting that there may be nuance for exact semantics which may or may not be compatible with F#'s existing semantics, and so the ability to accelerate using such APIs likely needs to be considered case by case.
Cool - if it were purely my decision, then, I'd just replace the implementation of Array.max
with a single call to System.Linq
's version, since it sounds like keeping up with System.Linq will be a bunch of duplicative work.
This needs to be a language suggestion, since it's a big change to core library.
It needs to cover corner cases, and current implementation comparisons (especially around handling NaNs, existing constraints vs ones which are coming from Enumerable, and such, as they might be breaking changes), also to include a proposed design with different targets for corelib.
We don't have current plans for new target for FSharp.Core, there are too many open questions with it.
As a testing ground, a separate library with same modules, shadowing builtin ones, might work.
I would be in favor of the following:
I would be in favor of the following:
- Separate standalone library with new NET targets, which would provide the vectorized functions
- Separate fsharp suggestions on FSharp.Core evolution w.r.t. to new APIs available in NET
- If the conclusion is that a modular design is fitting (real "core" being NS, additional libs based on newer targets), the library from # 1 above would live directly in this repo and be considered part of "FSharp.Core family of split libraries"
This should be a separate suggestion, since FSharp.Core
is a very special library, separating it will require a metric tonne of work, including type forwarding, changes to compiler, etc.
@KevinRansom tried to investigate it, it turned out to be a gigantic task.
Related, in regars to changes to standard library: https://github.com/dotnet/fsharp/issues/13207
This needs to be a language suggestion, since it's a big change to core library.
@vzarytovskii, could you elaborate on this a bit?
I definitely understand the general consideration around ensuring using any existing BCL method doesn't change the behavior or edge case handling before F# could make the switch to using it. Hence my comment of "Noting that there may be nuance for exact semantics which may or may not be compatible with F#'s existing semantics, and so the ability to accelerate using such APIs likely needs to be considered case by case."
However, SIMD doesn't itself change the semantics of the algorithm, only the performance, and can be made to ensure all the same edge cases are handled identically to the scalar implementation. So while it's work, and something that the F# team would have to be willing to maintain. It should just be an internal implementation detail that is otherwise unobservable to the consumer of the API. So, I don't understand why the use of SIMD in general is considered a "big change".
-- FSharp.Core
is already multitargeting netstandard2.0
and netstandard2.1
and so already has access to System.Numerics.Vector<T>
under which it could do some basic acceleration. However, it could also reasonably multi-target net8.0
(and moving forward the latest
version) and simply #if NET8_0_OR_GREATER
to light up usage of the newer intrinsics (whether on Vector<T>
or the newer Vector128<T>
and friends).
This needs to be a language suggestion, since it's a big change to core library.
@vzarytovskii, could you elaborate on this a bit? I don't understand why the use of SIMD in general is considered a "big change".
Because generally it might be a behaviour change at runtime (around NaNs, constraints, exceptions thrown, just general behaviour), it needs to be proven otherwise, hence the need of suggestion, test matrix, detailed description and probably some prototyping. We don't wan't to accidentally break someone just by upgrading SDK (and implicitly - FSharp.Core).
However, SIMD doesn't itself change the semantics of the algorithm, only the performance
That is not necessary true, we've seen a different behaviour between our functions and what's in Enumerable
in some corner cases. This needs to be investigated and described. Example: https://github.com/dotnet/fsharp/issues/13207
However, it could also reasonably multi-target
net8.0
(and moving forward thelatest
version) and simply#if NET8_0_OR_GREATER
to light up usage of the newer intrinsics (whether onVector<T>
or the newerVector128<T>
and friends).
We don't consider adding additional targets at this point, it needs to be designed and discussed separately - which targets do we support, for how long, how often do we introduce new once/discontinue old ones, do we keep FSharp.Core as a singular assembly or separate it into multiple ones, and so on.
That is not necessary true, we've seen a different behaviour between our functions and what's in Enumerable in some corner cases. This needs to be investigated and described. Example: https://github.com/dotnet/fsharp/issues/13207
That's a difference in implementation of two similar functions, it is not a difference caused by SIMD.
That is, Enumerable.M()
and some functionally similar M()
in FSharp.Core
may indeed be different. In that case, F# would presumably not want to use the BCL function.
However, nothing would prevent F# from writing its own vectorized code the behaves identically to F#'s existing functionality. It wouldn't even require FSharp.Core to reference anything additional in the case of the existing netstandard2.1
target. It is truly an implementation detail.
That's a difference in implementation of two similar functions, it is not a difference caused by SIMD.
That's what I meant, sorry that was unclear - if we use existing implementations from BCL, it will be a big change from standard library perspective (using BCL code vs our own, will affect inlining for example) and might be breaking.
So that's why a concrete language/library proposal is needed, so we clear what's proposed, what will be consequences, what are the alternatives, etc.
However, nothing would prevent F# from writing its own vectorized code the behaves identically to F#'s existing functionality. It wouldn't even require FSharp.Core to reference anything additional in the case of the existing netstandard2.1 target. It is truly an implementation detail.
Even in this case it has potential breaking something if some details are forgotten, hence - needs a suggestion + design.
Array.max
, for example, can be made more than an order of magnitude faster with vectorisation on appropriate hardware. The money shot (forgive the rather random naming, I copy-pasted some code which used to be about factorials, for example), on a machine withVector<byte>.Count = 16
:It's trivial to write the property-based test that asserts this agrees with
Array.max
; and I reran this usingArray.max
directly (rather than my vendored version) and got basically the same numbers.Describe the solution you'd like
All appropriate array methods should be vectorised. There are examples in System.Linq.
One difficulty is that
Vector<'a>
requires'a : (new : unit -> 'a) and 'a : struct and 'a :> System.ValueType
. I do not know how to make this change in a way that doesn't break the API, because I don't know how to provide one implementation for the'a
which satisfy those requirements and one for'a
which do not. But it would be great if we could simply provide various implementations ofArray.max
and have the correct one be selected.Describe alternatives you've considered
We can always not do this; people can write this themselves, or call out to System.Linq manually. It's sad if they have to do that, though.
Additional context Program: