Open simonbyrne opened 9 years ago
We used to generate B variates from two Γ variates. Was there a problem with doing that?
That should work, though R uses a different method.
One can circumvent the sampler by specializing the rand
method for single sample case.
If we are going this, we should consider allowing users to specify an instance of AbstractRNG
.
One can circumvent the sampler by specializing the rand method for single sample case.
@lindahua That seems like the best option so far, though requires duplicating a lot of code.
If we are going this, we should consider allowing users to specify an instance of AbstractRNG.
Definitely, I think the rough idea is to copy the IO
interface by making the first argument an instance of AbstractRNG
.
Probably don't need a lot of duplication. One can wrap the core implementation to an internal function (or a small number of functions), both the sampler and the single-sample rand
call that function.
I'm surprised that we have stopped using the Julia implementations for Γ, Beta, χ^2. Has there been an issue on that?
The test/samplers.jl
, which calls the test_samples
function, has provided a thorough check of all the samplers.
What it does is to generate a million samples from each sampler, and count the occurrences of each value (for discrete distributions) or count the number of samples falling in each small bin (for continuous distributions). For each of this number, it computes the confidence interval (based on Binomial distribution), and check whether the actual number is within the interval.
Even the exponential distribution relies on Rmath to generate random numbers, even though there is a very simple formula for this.
It may be that the most obvious/simple way to generate random numbers may not be the optimal way (in terms of efficiency or accuracy).
Even the exponential distribution relies on Rmath to generate random numbers
On master, rand(d::Exponential)
now calls Base.Random.randmtzig_exprnd
directly.
good.
Looks like Scipy and Boost can be a source of inspiration when we implement these. Both have very permissive licenses.
is this still something that we want to do? has there been any progress on this?
Yes, it is something we want to do.
Some code has been written (see src/samplers/, but it isn't live yet. I outlined some of the obstacles here. Suggestions/contributions welcome.
Any updates on this front?
I did a little bit of testing of the pure julia samplers in src/samplers/
a few months back and 0.6 was still slower than R even if we avoid recreating the sampler on each call. Might be worth testing on 0.7 now though.
As the Rmath-based
rand
methods are now broken in 0.4 (https://github.com/JuliaLang/julia/issues/8874), this seems like as good as time as any to implement them in Julia. We need:
[x]
Beta
(Fixed in https://github.com/JuliaStats/Distributions.jl/pull/830)
Can generate two
Chisq
s anda/(a+b)
, orR uses "Generating beta variates with nonintegral shape parameters" by R.C.H. Cheng.
[x]
Binomial
(Fixed in https://github.com/JuliaStats/Distributions.jl/pull/830)[x]
Chisq
(Fixed in https://github.com/JuliaStats/Distributions.jl/pull/830)
- Can use
Gamma
method.[x]
FDist
(Fixed in https://github.com/JuliaStats/Distributions.jl/pull/830)
- Can use ratio of
ChiSq
s.[x]
Gamma
(Fixed in https://github.com/JuliaStats/Distributions.jl/pull/830)[ ]
Hypergeometric
R uses "Computer generation of hypergeometric random variates" by Kachitvichyanukul and Schmeiser, though apparently there's an error in there somewhere.
Numpy uses ratio-of-uniforms, which seems to be the HRUA algorithm.
[ ]
NegativeBinomial
[ ]
NoncentralChisq
[ ]
Poisson
[ ]
Skellam
[x]
TDist
(Fixed in https://github.com/JuliaStats/Distributions.jl/pull/830)
- Use
Normal
/Chisq
mixture.Some of these have been implemented in the
src/samplers/
directory. They still need some checking and performance tuning.The rough idea, if I recall correctly, is to define a
sampler
method for each distribution that will return a "Sampler" object, which will choose the appropriate algorithm and precompute constants. While this seems to be fairly efficient for large draws from the same distribution, I have noticed problems where the distribution changes on each iteration (as would happen in something like Gibbs sampling), where type instability of the sampler object gives a huge penalty in performance.
I think we can update a few checks in here. NegativeBinomial,poisson and Skellam have been updated. So that leaves us with noncentrals and Hypergeometric right?
As the Rmath-based
rand
methods are now broken in 0.4 (https://github.com/JuliaLang/julia/issues/8874), this seems like as good as time as any to implement them in Julia. We need:Beta
(Fixed in https://github.com/JuliaStats/Distributions.jl/pull/830)Chisq
s anda/(a+b)
, orBinomial
(Fixed in https://github.com/JuliaStats/Distributions.jl/pull/830)Chisq
(Fixed in https://github.com/JuliaStats/Distributions.jl/pull/830)Gamma
method.FDist
(Fixed in https://github.com/JuliaStats/Distributions.jl/pull/830)ChiSq
s.Gamma
(Fixed in https://github.com/JuliaStats/Distributions.jl/pull/830)Hypergeometric
NegativeBinomial
NoncentralChisq
Poisson
Skellam
TDist
(Fixed in https://github.com/JuliaStats/Distributions.jl/pull/830)Normal
/Chisq
mixture.Some of these have been implemented in the
src/samplers/
directory. They still need some checking and performance tuning.The rough idea, if I recall correctly, is to define a
sampler
method for each distribution that will return a "Sampler" object, which will choose the appropriate algorithm and precompute constants. While this seems to be fairly efficient for large draws from the same distribution, I have noticed problems where the distribution changes on each iteration (as would happen in something like Gibbs sampling), where type instability of the sampler object gives a huge penalty in performance.