JuliaSIMD / LoopVectorization.jl

Macro(s) for vectorizing loops.
MIT License
741 stars 66 forks source link

tracking Special Functions' support #233

Open CarloLucibello opened 3 years ago

CarloLucibello commented 3 years ago

This is a follow-up of #232. Right now I would be interested in LV's support of the whole erf family (erfc and erfcx in particular), but since we are at it maybe it is useful to track the whole special functions' family:

chriselrod commented 3 years ago

VectorizationBase.jl may be a better place for this issue, but this is fine too. erf only gets partial credit at the moment.

If someone wants to work on it, some of these can probably be translated fairly directly from SpecialFunctions.jl to handle VectorizationBase.AbstractSIMD arguments. Some may require functions like SLEEFPirates.cot, which could be moved to VectorizationBase.

chriselrod commented 3 years ago

However it isn't always possible, e.g. when implementations are broken up to behave differently depending on range, e.g. Taylor series for small values vs recurrences for large ones.

Also of interest: https://github.com/JuliaMath/openspecfun/blob/master/Faddeeva/Faddeeva.cc SpecialFunctions doesn't have Julia implementations for all of these yet.

I'd be happy to answer any questions that I can if you (or anyone else) wants to take a stab at it.

CarloLucibello commented 3 years ago

If for many functions it is just a matter of broadening the allowed input types, would it make sense to just have SpecialFunctions depend on VectorizationBase?

chriselrod commented 3 years ago

If for many functions it is just a matter of broadening the allowed input types, would it make sense to just have SpecialFunctions depend on VectorizationBase?

They'd also need to replace ifs with IfElse.ifelse. Loops such as in digamma are trickier. Long term, I think it'd be cool to write something able to compile these functions automatically to be SIMD, in a manner similar to ISPC.

VectorizationBase has unfortunately gotten a bit heavy as a dependency.

julia -O3 -q --startup=no -e '@time using VectorizationBase'
  0.561828 seconds (1.97 M allocations: 109.844 MiB, 1.22% gc time, 4.59% compilation time)
julia -O3 -q --startup=no -e '@time using VectorizationBase'
  0.520969 seconds (1.97 M allocations: 109.829 MiB, 1.33% gc time, 4.80% compilation time)
julia -O3 -q --startup=no -e '@time using VectorizationBase'
  0.575595 seconds (1.97 M allocations: 109.829 MiB, 1.22% gc time, 4.35% compilation time)

on another computer

  0.872136 seconds (2.06 M allocations: 117.637 MiB, 1.62% gc time, 4.06% compilation time)
  0.870963 seconds (2.06 M allocations: 117.639 MiB, 1.62% gc time, 3.91% compilation time)
  0.899115 seconds (2.06 M allocations: 117.637 MiB, 1.61% gc time, 7.04% compilation time)

I'll have to take a look at how much this can be improved. I could split off at least some of the hardware-related parts, but llvmcall call code would like to depend on/use at least the cpu_feature parts.

yuvalwas commented 1 year ago

If possible, it would be helpful to add a more specific warning when one uses a special function. I came here after spending an hour before realizing that what makes me get "LoopVectorization.check_args on your inputs failed" is using logfactorial.