JuliaMath / SpecialFunctions.jl

Special mathematical functions in Julia
https://specialfunctions.juliamath.org/stable/
Other
358 stars 100 forks source link

duplicate bessel function definitions #178

Open PaulXiCao opened 5 years ago

PaulXiCao commented 5 years ago

What is the reason why there are duplicate calls to bessel functions available? For instance

besselj(nu,z)
besselj0(x)
besselj1(x)

Isnt that an implementation detail that the library user should not care about?

PaulXiCao commented 5 years ago

There seems to be no performance boost for the MPFR:

julia> x=10; @time ccall((:mpfr_j0, :libmpfr), Int32, (Ref{BigFloat}, Ref{BigFloat}, Int32), z, x, ROUNDING_MODE[])
  0.000070 seconds (12 allocations: 394 bytes)
1

julia> n=0; x=10; @time ccall((:mpfr_jn, :libmpfr), Int32, (Ref{BigFloat}, Clong, Ref{BigFloat}, Int32), z, n, x, ROUNDING_MODE[])
  0.000070 seconds (12 allocations: 394 bytes)
PaulXiCao commented 5 years ago

I looked around and most other software also does not distinguish between orders 0,1,n

PaulXiCao commented 5 years ago

using BenchmarkTools one can see that the BigFloat implementation is of the same speed whereas the Float64 implementation is twice as fast when using besselj0(x) instead of besselj(0,x)

julia> n=0; x=10; @benchmark ccall((:mpfr_jn, :libmpfr), Int32, (Ref{BigFloat}, Clong, Ref{BigFloat}, Int32), z, n, x, ROUNDING_MODE[])
BenchmarkTools.Trial: 
  memory estimate:  394 bytes
  allocs estimate:  12
  --------------
  minimum time:     30.922 μs (0.00% GC)
  median time:      31.226 μs (0.00% GC)
  mean time:        31.518 μs (0.00% GC)
  maximum time:     94.123 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

julia> x=10; @benchmark ccall((:mpfr_j0, :libmpfr), Int32, (Ref{BigFloat}, Ref{BigFloat}, Int32), z, x, ROUNDING_MODE[])
BenchmarkTools.Trial: 
  memory estimate:  394 bytes
  allocs estimate:  12
  --------------
  minimum time:     30.202 μs (0.00% GC)
  median time:      30.553 μs (0.00% GC)
  mean time:        30.859 μs (0.00% GC)
  maximum time:     118.430 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     1

julia> x=10; @benchmark ccall(("j0",libm),  Float64, (Float64,), x)
BenchmarkTools.Trial: 
  memory estimate:  16 bytes
  allocs estimate:  1
  --------------
  minimum time:     625.947 ns (0.00% GC)
  median time:      642.450 ns (0.00% GC)
  mean time:        656.163 ns (1.80% GC)
  maximum time:     118.823 μs (99.40% GC)
  --------------
  samples:          10000
  evals/sample:     171

julia> nu=0; x=10; @benchmark ccall(("jn", libm), Float64, (Cint, Float64), nu, x)
BenchmarkTools.Trial: 
  memory estimate:  16 bytes
  allocs estimate:  1
  --------------
  minimum time:     1.128 μs (0.00% GC)
  median time:      1.168 μs (0.00% GC)
  mean time:        1.179 μs (0.00% GC)
  maximum time:     4.927 μs (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     10

What about only offering the interface besselj(n,x) but making special calls for n=0,1 and other?

musm commented 4 years ago

Agreed.

Could you open a PR?

heltonmc commented 2 years ago

I'd probably recommend keeping the besselj0 and friends syntax. Scipy actually does differentiate this and these functions are some of the most used.

I'm not sure if a microbenchmark like this will catch the performance gain because of the branch prediction (or how that propagates from the C library?). Almost all of these algorithms default to specialized calls for nu=0,1 so it probably is good to separate them.

Though still in development it is faster to avoid the branches in Bessels.jl from the generic besselj call....

julia> @benchmark Bessels._besselj(1.0, x) setup=(x=rand())
BenchmarkTools.Trial: 10000 samples with 999 evaluations.
 Range (min … max):  9.425 ns … 29.029 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     9.510 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   9.532 ns ±  0.412 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

       ▄     █     ▅     ▃     ▂                             ▁
  ▇▁▁▁▁█▁▁▁▁▁█▁▁▁▁▁█▁▁▁▁▁█▁▁▁▁▁█▁▁▁▁▆▁▁▁▁▁▄▁▁▁▁▁▁▁▁▁▁▁▅▁▁▁▁▅ █
  9.43 ns      Histogram: log(frequency) by time     9.84 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark Bessels.besselj1(x) setup=(x=rand())
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  6.375 ns … 29.542 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     6.459 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   6.551 ns ±  0.796 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

     ▃  █  ▇  ▄   ▂     ▂  ▄  ▁                ▁             ▂
  ▆▁▁█▁▁█▁▁█▁▁█▁▁▁█▁▁▆▁▁█▁▁█▁▁█▁▁▁█▁▁▇▁▁▆▁▁█▁▁▁█▁▁▇▁▁▇▁▁▇▁▁▆ █
  6.38 ns      Histogram: log(frequency) by time     7.12 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.
stevengj commented 2 years ago

It's a bit surprising to me that the branch is so significant in cost compared to the Bessel computation…

Couldn't we set up the besselj(n,x) routine so that constant-propagation can eliminate the n==0 and n==1 checks when n is a literal 0 or 1?

heltonmc commented 2 years ago

I was honestly a bit surprised too. I will probably open an issue over at Bessels.jl to continue this discussion there because this 3-4 ns difference holds across all the input ranges...

julia> @benchmark Bessels.besselj1(x) setup=(x=rand()+100.0)
BenchmarkTools.Trial: 10000 samples with 997 evaluations.
 Range (min … max):  20.436 ns … 86.552 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     20.855 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   20.943 ns ±  1.397 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▃█▇▃▁█▇▄ ▁▃   ▁▃                                            ▂
  ████████▇███▇████▆▇██▇▆▆█▆▅▆▃▁▄▁▁▁▃▁▁▃▁▄▁▁▁▁▁▃▃▄▄▁▄▃▃▃▄▄▄▃▄ █
  20.4 ns      Histogram: log(frequency) by time      25.2 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark Bessels._besselj(1.0, x) setup=(x=rand()+100.0)
BenchmarkTools.Trial: 10000 samples with 996 evaluations.
 Range (min … max):  24.598 ns … 53.087 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     25.268 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   25.216 ns ±  1.259 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

   ▃█      ▃▆                                                  
  ▂██▃▂▂▁▁▃██▆▄▂▂▂▂▁▁▂▂▂▂▂▂▁▂▁▂▁▂▁▂▁▁▁▁▁▂▂▂▂▂▁▁▁▂▁▂▂▂▁▂▂▂▁▂▁▂ ▃
  24.6 ns         Histogram: frequency by time        28.9 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

I'm hoping this has something to do with actively working on the generic implementation for arbitrary nu so benchmarking might be a little off for now.

Though I would think this function would be able to specialize better for nu=0,1?

function _besselj(nu, x)
    nu == 0 && return besselj0(x)
    nu == 1 && return besselj1(x)
    if x < 20.0
        if nu > 60.0
            return log_besselj_small_arguments_orders(nu, x)
        else
            return besselj_small_arguments_orders(nu, x)
        end
    elseif 2*x < nu
        return besselj_debye(nu, x)
    elseif x > nu
        return besselj_large_argument(nu, x)
    else
        return nothing
    end
end