JuliaMath / IntelVectorMath.jl

Julia bindings for the Intel Vector Math Library
Other
73 stars 18 forks source link

Scalar Calculation? #44

Open aminya opened 4 years ago

aminya commented 4 years ago

Now the library only supports doing a calculation on an Array and also returns an Array.

It may be worth while to define scalar methods too.

julia> IVM.sin(1.1)
ERROR: MethodError: no method matching sin(::Float64)
You may have intended to import Base.sin
Closest candidates are:
  sin(::Array{Float32,N} where N) at C:\Users\yahyaaba\.julia\packages\IntelVectorMath\Gb348\src\setup.jl:72
  sin(::Array{Float64,N} where N) at C:\Users\yahyaaba\.julia\packages\IntelVectorMath\Gb348\src\setup.jl:72
Stacktrace:
 [1] top-level scope at none:0

This way we only use Intel for calculating one scalar number, which (if possible) helps to fuse for-loops with broadcasted functions and use @avx or @simd features of Julia instead for parallelization.

We should see if Intel provides scalar API. Because if it only provides Vector API, and the function call uses the Vector Processor Unit of the CPU, we cannot parallelize the function. This is like vectorizing an already vectorized function (although having a size of 1), which doesn't have an effect.

Related to https://github.com/JuliaMath/IntelVectorMath.jl/issues/43, which can help to implement the 3rd macro.

This can also solve https://github.com/JuliaMath/IntelVectorMath.jl/issues/22, by using Intel-only for a scalar call and provide an SVML like behavior using @avx or @simd.

Places to look into:

Crown421 commented 4 years ago

Intriguing. I suppose a few tests would be necessary to see if the speed is comparable with base.

aminya commented 4 years ago

Intriguing. I suppose a few tests would be necessary to see if the speed is comparable with base.

We should see if Intel provides scalar API. Because if it only provides Vector API, and the function call uses the Vector Processor Unit of the CPU, we cannot parallelize the function. This is like vectorizing an already vectorized function (although having a size of 1), which doesn't have an effect.