Open lindahua opened 10 years ago
The benchmark script is perf/samplers.jl
.
That's interesting, I get different results when using the standard rand
methods:
using Distributions
using RmathDist
then after warmup
julia> @time rand(Gamma(5.0,1.0),10_000_000)
elapsed time: 0.415310414 seconds (80000152 bytes allocated)
julia> @time rand(Rmath(Gamma(5.0,1.0)),10_000_000)
elapsed time: 0.4185569 seconds (80000176 bytes allocated)
The gamma distribution still uses Rmath.
I temporarily turned off the use of GammaMTSampler in recent refactoring, haven't turned it back yet.
Ah, I see. But now I get:
julia> @time rand(Distributions.GammaMTSampler(5.0,1.0),10_000_000)
elapsed time: 0.501402219 seconds (80000168 bytes allocated)
which is only 20% worse.
That's probably because the memory allocation + memory access factors in, thus the difference is not so big.
What I did in the benchmark is basically the following:
# not exactly the same code, but doing things the same way as below
function batch_sample!(sampler, n)
for i = 1:n
rand(sampler)
end
end
Purely doing sampling, and no memory-related stuff counted in the benchmark.
This one is even stranger: your tests suggest the Poisson AD sampler is twice as fast, yet:
julia> a = Distributions.PoissonADSampler(20.0)
PoissonADSampler(20.0,4.47213595499958,2400.0,18)
julia> r = Distributions.PoissonRmathSampler(20.0)
PoissonRmathSampler(20.0)
(then after warmup)
julia> @time rand(a,10_000_000);
elapsed time: 0.634922107 seconds (80000128 bytes allocated)
julia> @time rand(r,10_000_000);
elapsed time: 0.440410505 seconds (80000128 bytes allocated)
I've added some more functionality to perf/samplers.jl
:
The batch
method just tests the performance of the iteration (old behaviour):
$ julia perf/samplers.jl batch gamma_hi
BenchmarkTable [unit = mps]
Dist | rmath MT GD
--------------------------------------------------------------
(Gamma(shape=1.5, scale=1.0),) | 17.0605 20.6627 14.9219
(Gamma(shape=2.0, scale=1.0),) | 16.1565 20.0422 14.9253
(Gamma(shape=3.0, scale=1.0),) | 16.4016 20.1534 14.3530
(Gamma(shape=5.0, scale=1.0),) | 25.0876 21.1654 27.6697
(Gamma(shape=20.0, scale=1.0),) | 29.6898 21.9201 34.6155
The indiv
tests the performance of the construction and iteration (this may be misleading for Rmath
, as it uses static variables):
julia perf/samplers.jl indiv gamma_hi
BenchmarkTable [unit = mps]
Dist | rmath MT GD
--------------------------------------------------------------
(Gamma(shape=1.5, scale=1.0),) | 15.7482 19.1207 7.5798
(Gamma(shape=2.0, scale=1.0),) | 15.7333 19.5103 7.7571
(Gamma(shape=3.0, scale=1.0),) | 16.1120 19.2441 7.7500
(Gamma(shape=5.0, scale=1.0),) | 24.3544 19.8842 9.8533
(Gamma(shape=20.0, scale=1.0),) | 27.0493 20.6376 11.4383
There are obvious differences here. If the RNG is initialised, do we want both rand(d,N)
and [rand(d) for i=1:N]
to give identical results?
is this an issue to keep around? Likely it can't be "fixed" directly cc @simonbyrne @lindahua
Likely related to this, the perf
folder is in a deprecated state, to remove or update
I'd support / help with work on this issue. I keep running into this when figuring out the best way to sample from Categorical
distributions. It looks like the poly-algorithm in StatsBase.jl,
performs better as it has a nice rule of thumb (presumably based on the benchmarks from @lindahua at the beginning of this issue) for choosing the alias table vs direct sampling.
It would be nice to get the same performance no matter how you sample from a cagegorical 😄
This is a very old issue but I just discoverd that the cheap acceptance check in the GammaMTSampler
seems to be incorrect: https://github.com/JuliaStats/Distributions.jl/pull/1617#discussion_r970098297
Fixing it improves performance quite significantly:
$ env gamma=true julia --project=. samplers.jl
[ Info: Gamma
[ Info: Low
[ Info: GammaGSSampler
[ Info: α: 0.1, result: Trial(30.570 ns)
[ Info: α: 0.5, result: Trial(45.151 ns)
[ Info: α: 0.9, result: Trial(49.529 ns)
[ Info: GammaIPSampler
[ Info: α: 0.1, result: Trial(24.658 ns)
[ Info: α: 0.5, result: Trial(25.022 ns)
[ Info: α: 0.9, result: Trial(24.698 ns)
[ Info: High
[ Info: GammaMTSampler
[ Info: α: 1.5, result: Trial(13.292 ns)
[ Info: α: 2.0, result: Trial(13.209 ns)
[ Info: α: 3.0, result: Trial(12.950 ns)
[ Info: α: 5.0, result: Trial(13.127 ns)
[ Info: α: 20.0, result: Trial(13.135 ns)
[ Info: GammaGDSampler
[ Info: α: 1.5, result: Trial(29.132 ns)
[ Info: α: 2.0, result: Trial(27.114 ns)
[ Info: α: 3.0, result: Trial(24.347 ns)
[ Info: α: 5.0, result: Trial(18.196 ns)
[ Info: α: 20.0, result: Trial(16.562 ns)
With the fix, for all tested parameter values (here and in https://github.com/JuliaStats/Distributions.jl/pull/1617) GammaMTSampler
seems to be faster than GammaGDSampler
.
Thanks to @simonbyrnes' recent efforts, we already have some in-house samplers.
I just run a systematic benchmarking of samplers implemented in this package (using BenchmarkLite.jl):
Here are the results. They are measured in terms of MPS, that is, million samples per second -- larger number indicates higher throughput.
Categorical:
Binomial
Poisson
Exponential
Gamma
We are still falling behind Rmath (over 2x slower) for Binomial and Gamma distributions.