Open PaulXiCao opened 5 years ago
There seems to be no performance boost for the MPFR:
julia> x=10; @time ccall((:mpfr_j0, :libmpfr), Int32, (Ref{BigFloat}, Ref{BigFloat}, Int32), z, x, ROUNDING_MODE[])
0.000070 seconds (12 allocations: 394 bytes)
1
julia> n=0; x=10; @time ccall((:mpfr_jn, :libmpfr), Int32, (Ref{BigFloat}, Clong, Ref{BigFloat}, Int32), z, n, x, ROUNDING_MODE[])
0.000070 seconds (12 allocations: 394 bytes)
I looked around and most other software also does not distinguish between orders 0,1,n
using BenchmarkTools one can see that the BigFloat implementation is of the same speed whereas the Float64 implementation is twice as fast when using besselj0(x)
instead of besselj(0,x)
julia> n=0; x=10; @benchmark ccall((:mpfr_jn, :libmpfr), Int32, (Ref{BigFloat}, Clong, Ref{BigFloat}, Int32), z, n, x, ROUNDING_MODE[])
BenchmarkTools.Trial:
memory estimate: 394 bytes
allocs estimate: 12
--------------
minimum time: 30.922 μs (0.00% GC)
median time: 31.226 μs (0.00% GC)
mean time: 31.518 μs (0.00% GC)
maximum time: 94.123 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 1
julia> x=10; @benchmark ccall((:mpfr_j0, :libmpfr), Int32, (Ref{BigFloat}, Ref{BigFloat}, Int32), z, x, ROUNDING_MODE[])
BenchmarkTools.Trial:
memory estimate: 394 bytes
allocs estimate: 12
--------------
minimum time: 30.202 μs (0.00% GC)
median time: 30.553 μs (0.00% GC)
mean time: 30.859 μs (0.00% GC)
maximum time: 118.430 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 1
julia> x=10; @benchmark ccall(("j0",libm), Float64, (Float64,), x)
BenchmarkTools.Trial:
memory estimate: 16 bytes
allocs estimate: 1
--------------
minimum time: 625.947 ns (0.00% GC)
median time: 642.450 ns (0.00% GC)
mean time: 656.163 ns (1.80% GC)
maximum time: 118.823 μs (99.40% GC)
--------------
samples: 10000
evals/sample: 171
julia> nu=0; x=10; @benchmark ccall(("jn", libm), Float64, (Cint, Float64), nu, x)
BenchmarkTools.Trial:
memory estimate: 16 bytes
allocs estimate: 1
--------------
minimum time: 1.128 μs (0.00% GC)
median time: 1.168 μs (0.00% GC)
mean time: 1.179 μs (0.00% GC)
maximum time: 4.927 μs (0.00% GC)
--------------
samples: 10000
evals/sample: 10
What about only offering the interface besselj(n,x)
but making special calls for n=0,1
and other?
Agreed.
Could you open a PR?
I'd probably recommend keeping the besselj0
and friends syntax. Scipy actually does differentiate this and these functions are some of the most used.
I'm not sure if a microbenchmark like this will catch the performance gain because of the branch prediction (or how that propagates from the C library?). Almost all of these algorithms default to specialized calls for nu=0,1
so it probably is good to separate them.
Though still in development it is faster to avoid the branches in Bessels.jl from the generic besselj
call....
julia> @benchmark Bessels._besselj(1.0, x) setup=(x=rand())
BenchmarkTools.Trial: 10000 samples with 999 evaluations.
Range (min … max): 9.425 ns … 29.029 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 9.510 ns ┊ GC (median): 0.00%
Time (mean ± σ): 9.532 ns ± 0.412 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▄ █ ▅ ▃ ▂ ▁
▇▁▁▁▁█▁▁▁▁▁█▁▁▁▁▁█▁▁▁▁▁█▁▁▁▁▁█▁▁▁▁▆▁▁▁▁▁▄▁▁▁▁▁▁▁▁▁▁▁▅▁▁▁▁▅ █
9.43 ns Histogram: log(frequency) by time 9.84 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark Bessels.besselj1(x) setup=(x=rand())
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
Range (min … max): 6.375 ns … 29.542 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 6.459 ns ┊ GC (median): 0.00%
Time (mean ± σ): 6.551 ns ± 0.796 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▃ █ ▇ ▄ ▂ ▂ ▄ ▁ ▁ ▂
▆▁▁█▁▁█▁▁█▁▁█▁▁▁█▁▁▆▁▁█▁▁█▁▁█▁▁▁█▁▁▇▁▁▆▁▁█▁▁▁█▁▁▇▁▁▇▁▁▇▁▁▆ █
6.38 ns Histogram: log(frequency) by time 7.12 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
It's a bit surprising to me that the branch is so significant in cost compared to the Bessel computation…
Couldn't we set up the besselj(n,x)
routine so that constant-propagation can eliminate the n==0
and n==1
checks when n
is a literal 0
or 1
?
I was honestly a bit surprised too. I will probably open an issue over at Bessels.jl to continue this discussion there because this 3-4 ns difference holds across all the input ranges...
julia> @benchmark Bessels.besselj1(x) setup=(x=rand()+100.0)
BenchmarkTools.Trial: 10000 samples with 997 evaluations.
Range (min … max): 20.436 ns … 86.552 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 20.855 ns ┊ GC (median): 0.00%
Time (mean ± σ): 20.943 ns ± 1.397 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▃█▇▃▁█▇▄ ▁▃ ▁▃ ▂
████████▇███▇████▆▇██▇▆▆█▆▅▆▃▁▄▁▁▁▃▁▁▃▁▄▁▁▁▁▁▃▃▄▄▁▄▃▃▃▄▄▄▃▄ █
20.4 ns Histogram: log(frequency) by time 25.2 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark Bessels._besselj(1.0, x) setup=(x=rand()+100.0)
BenchmarkTools.Trial: 10000 samples with 996 evaluations.
Range (min … max): 24.598 ns … 53.087 ns ┊ GC (min … max): 0.00% … 0.00%
Time (median): 25.268 ns ┊ GC (median): 0.00%
Time (mean ± σ): 25.216 ns ± 1.259 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▃█ ▃▆
▂██▃▂▂▁▁▃██▆▄▂▂▂▂▁▁▂▂▂▂▂▂▁▂▁▂▁▂▁▂▁▁▁▁▁▂▂▂▂▂▁▁▁▂▁▂▂▂▁▂▂▂▁▂▁▂ ▃
24.6 ns Histogram: frequency by time 28.9 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
I'm hoping this has something to do with actively working on the generic implementation for arbitrary nu so benchmarking might be a little off for now.
Though I would think this function would be able to specialize better for nu=0,1
?
function _besselj(nu, x)
nu == 0 && return besselj0(x)
nu == 1 && return besselj1(x)
if x < 20.0
if nu > 60.0
return log_besselj_small_arguments_orders(nu, x)
else
return besselj_small_arguments_orders(nu, x)
end
elseif 2*x < nu
return besselj_debye(nu, x)
elseif x > nu
return besselj_large_argument(nu, x)
else
return nothing
end
end
What is the reason why there are duplicate calls to bessel functions available? For instance
Isnt that an implementation detail that the library user should not care about?