Open CarloLucibello opened 3 years ago
VectorizationBase.jl may be a better place for this issue, but this is fine too.
erf
only gets partial credit at the moment.
If someone wants to work on it, some of these can probably be translated fairly directly from SpecialFunctions.jl
to handle VectorizationBase.AbstractSIMD
arguments.
Some may require functions like SLEEFPirates.cot
, which could be moved to VectorizationBase
.
However it isn't always possible, e.g. when implementations are broken up to behave differently depending on range, e.g. Taylor series for small values vs recurrences for large ones.
Also of interest: https://github.com/JuliaMath/openspecfun/blob/master/Faddeeva/Faddeeva.cc SpecialFunctions doesn't have Julia implementations for all of these yet.
I'd be happy to answer any questions that I can if you (or anyone else) wants to take a stab at it.
If for many functions it is just a matter of broadening the allowed input types, would it make sense to just have SpecialFunctions depend on VectorizationBase?
If for many functions it is just a matter of broadening the allowed input types, would it make sense to just have SpecialFunctions depend on VectorizationBase?
They'd also need to replace if
s with IfElse.ifelse
.
Loops such as in digamma
are trickier. Long term, I think it'd be cool to write something able to compile these functions automatically to be SIMD, in a manner similar to ISPC.
VectorizationBase
has unfortunately gotten a bit heavy as a dependency.
julia -O3 -q --startup=no -e '@time using VectorizationBase'
0.561828 seconds (1.97 M allocations: 109.844 MiB, 1.22% gc time, 4.59% compilation time)
julia -O3 -q --startup=no -e '@time using VectorizationBase'
0.520969 seconds (1.97 M allocations: 109.829 MiB, 1.33% gc time, 4.80% compilation time)
julia -O3 -q --startup=no -e '@time using VectorizationBase'
0.575595 seconds (1.97 M allocations: 109.829 MiB, 1.22% gc time, 4.35% compilation time)
on another computer
0.872136 seconds (2.06 M allocations: 117.637 MiB, 1.62% gc time, 4.06% compilation time)
0.870963 seconds (2.06 M allocations: 117.639 MiB, 1.62% gc time, 3.91% compilation time)
0.899115 seconds (2.06 M allocations: 117.637 MiB, 1.61% gc time, 7.04% compilation time)
I'll have to take a look at how much this can be improved.
I could split off at least some of the hardware-related parts, but llvmcall call code would like to depend on/use at least the cpu_feature
parts.
If possible, it would be helpful to add a more specific warning when one uses a special function. I came here after spending an hour before realizing that what makes me get "LoopVectorization.check_args
on your inputs failed" is using logfactorial
.
This is a follow-up of #232. Right now I would be interested in LV's support of the whole
erf
family (erfc
anderfcx
in particular), but since we are at it maybe it is useful to track the whole special functions' family: