For some vectorized functions like sum, handling leftovers when the array is not divisible by the Vector width is obvious, and easy, we just add the array elements to the running total.
For others, like map, exists, etc. the way I am currently handling leftovers could produce unexpected or undesired behavior. For instance, say someones map functions did something like:
if vector.[3] = 0 then Vector<int>(10)
The current approach, which pads the leftovers with zeros, could cause the final values to all be 10 when that wasn't intended.
I think the only correct approach is to make the user provide two functions, one for handling the Vectors, and another for handling leftovers. This has the downside of making the API a little more cumbersome to use in cases when the two functions are identical:
let x = array |> Array.SIMD.map sqaure
// would become
let x = array |> Array.SIMD.map square square
On the bright side this simplifies the code of most of the functions, and even reduces allocations and improves speed for small sizes:
let inline mapnew
(vf : ^T Vector -> ^U Vector) (sf : ^T -> ^U) (array : ^T[]) : ^U[] =
checkNonNull array
let count = Vector< ^T>.Count
if count <> Vector< ^U>.Count then invalidArg "array" "Output type must have the same width as input type."
let len = array.Length
let result = Array.zeroCreate len
let lenLessCount = len-count
let mutable i = 0
while i <= lenLessCount do
(vf (Vector< ^T>(array,i ))).CopyTo(result,i)
i <- i + count
while i < len do
result.[i] <- sf array.[i]
i <- i + 1
result
For some vectorized functions like sum, handling leftovers when the array is not divisible by the Vector width is obvious, and easy, we just add the array elements to the running total.
For others, like map, exists, etc. the way I am currently handling leftovers could produce unexpected or undesired behavior. For instance, say someones map functions did something like:
The current approach, which pads the leftovers with zeros, could cause the final values to all be 10 when that wasn't intended.
I think the only correct approach is to make the user provide two functions, one for handling the Vectors, and another for handling leftovers. This has the downside of making the API a little more cumbersome to use in cases when the two functions are identical:
On the bright side this simplifies the code of most of the functions, and even reduces allocations and improves speed for small sizes:
Thoughts? Alternative approaches?