Open aminya opened 4 years ago
Intriguing. I suppose a few tests would be necessary to see if the speed is comparable with base.
Intriguing. I suppose a few tests would be necessary to see if the speed is comparable with base.
We should see if Intel provides scalar API. Because if it only provides Vector API, and the function call uses the Vector Processor Unit of the CPU, we cannot parallelize the function. This is like vectorizing an already vectorized function (although having a size of 1), which doesn't have an effect.
Now the library only supports doing a calculation on an
Array
and also returns anArray
.It may be worth while to define scalar methods too.
This way we only use Intel for calculating one scalar number, which (if possible) helps to fuse for-loops with broadcasted functions and use
@avx
or@simd
features of Julia instead for parallelization.We should see if Intel provides scalar API. Because if it only provides Vector API, and the function call uses the Vector Processor Unit of the CPU, we cannot parallelize the function. This is like vectorizing an already vectorized function (although having a size of 1), which doesn't have an effect.
Related to https://github.com/JuliaMath/IntelVectorMath.jl/issues/43, which can help to implement the 3rd macro.
This can also solve https://github.com/JuliaMath/IntelVectorMath.jl/issues/22, by using Intel-only for a scalar call and provide an SVML like behavior using
@avx
or@simd
.Places to look into: