Replace Rmath RNGs - Githubissues

simonbyrne commented 9 years ago

As the Rmath-based rand methods are now broken in 0.4 (https://github.com/JuliaLang/julia/issues/8874), this seems like as good as time as any to implement them in Julia. We need:

[x] Beta (Fixed in https://github.com/JuliaStats/Distributions.jl/pull/830)
- Can generate two Chisqs and a/(a+b), or
- R uses "Generating beta variates with nonintegral shape parameters" by R.C.H. Cheng.
[x] Binomial (Fixed in https://github.com/JuliaStats/Distributions.jl/pull/830)
[x] Chisq (Fixed in https://github.com/JuliaStats/Distributions.jl/pull/830)
- Can use Gamma method.
[x] FDist (Fixed in https://github.com/JuliaStats/Distributions.jl/pull/830)
- Can use ratio of ChiSqs.
[x] Gamma (Fixed in https://github.com/JuliaStats/Distributions.jl/pull/830)
[ ] Hypergeometric
- R uses "Computer generation of hypergeometric random variates" by Kachitvichyanukul and Schmeiser, though apparently there's an error in there somewhere.
- Numpy uses ratio-of-uniforms, which seems to be the HRUA algorithm.
[ ] NegativeBinomial
[ ] NoncentralChisq
[ ] Poisson
[ ] Skellam
[x] TDist (Fixed in https://github.com/JuliaStats/Distributions.jl/pull/830)
- Use Normal/Chisq mixture.

Some of these have been implemented in the src/samplers/ directory. They still need some checking and performance tuning.

The rough idea, if I recall correctly, is to define a sampler method for each distribution that will return a "Sampler" object, which will choose the appropriate algorithm and precompute constants. While this seems to be fairly efficient for large draws from the same distribution, I have noticed problems where the distribution changes on each iteration (as would happen in something like Gibbs sampling), where type instability of the sampler object gives a huge penalty in performance.

andreasnoack commented 9 years ago

We used to generate B variates from two Γ variates. Was there a problem with doing that?

simonbyrne commented 9 years ago

That should work, though R uses a different method.

lindahua commented 9 years ago

One can circumvent the sampler by specializing the rand method for single sample case.

lindahua commented 9 years ago

If we are going this, we should consider allowing users to specify an instance of AbstractRNG.

simonbyrne commented 9 years ago

One can circumvent the sampler by specializing the rand method for single sample case.

@lindahua That seems like the best option so far, though requires duplicating a lot of code.

simonbyrne commented 9 years ago

If we are going this, we should consider allowing users to specify an instance of AbstractRNG.

Definitely, I think the rough idea is to copy the IO interface by making the first argument an instance of AbstractRNG.

lindahua commented 9 years ago

Probably don't need a lot of duplication. One can wrap the core implementation to an internal function (or a small number of functions), both the sampler and the single-sample rand call that function.

andreasnoack commented 9 years ago

I'm surprised that we have stopped using the Julia implementations for Γ, Beta, χ^2. Has there been an issue on that?

lindahua commented 9 years ago

The test/samplers.jl, which calls the test_samples function, has provided a thorough check of all the samplers.

What it does is to generate a million samples from each sampler, and count the occurrences of each value (for discrete distributions) or count the number of samples falling in each small bin (for continuous distributions). For each of this number, it computes the confidence interval (based on Binomial distribution), and check whether the actual number is within the interval.

lindahua commented 9 years ago

Even the exponential distribution relies on Rmath to generate random numbers, even though there is a very simple formula for this.

lindahua commented 9 years ago

It may be that the most obvious/simple way to generate random numbers may not be the optimal way (in terms of efficiency or accuracy).

simonbyrne commented 9 years ago

Even the exponential distribution relies on Rmath to generate random numbers

On master, rand(d::Exponential) now calls Base.Random.randmtzig_exprnd directly.

lindahua commented 9 years ago

good.

Looks like Scipy and Boost can be a source of inspiration when we implement these. Both have very permissive licenses.

rawls238 commented 8 years ago

is this still something that we want to do? has there been any progress on this?

simonbyrne commented 8 years ago

Yes, it is something we want to do.

Some code has been written (see src/samplers/, but it isn't live yet. I outlined some of the obstacles here. Suggestions/contributions welcome.

Nosferican commented 6 years ago

Any updates on this front?

rofinn commented 6 years ago

I did a little bit of testing of the pure julia samplers in src/samplers/ a few months back and 0.6 was still slower than R even if we avoid recreating the sampler on each call. Might be worth testing on 0.7 now though.

ArunS-tack commented 1 year ago

As the Rmath-based rand methods are now broken in 0.4 (https://github.com/JuliaLang/julia/issues/8874), this seems like as good as time as any to implement them in Julia. We need:

[x] Beta (Fixed in https://github.com/JuliaStats/Distributions.jl/pull/830)

Can generate two Chisqs and a/(a+b), or

R uses "Generating beta variates with nonintegral shape parameters" by R.C.H. Cheng.

[x] Binomial (Fixed in https://github.com/JuliaStats/Distributions.jl/pull/830)

[x] Chisq (Fixed in https://github.com/JuliaStats/Distributions.jl/pull/830)

Can use Gamma method.

[x] FDist (Fixed in https://github.com/JuliaStats/Distributions.jl/pull/830)

Can use ratio of ChiSqs.

[x] Gamma (Fixed in https://github.com/JuliaStats/Distributions.jl/pull/830)

[ ] Hypergeometric

R uses "Computer generation of hypergeometric random variates" by Kachitvichyanukul and Schmeiser, though apparently there's an error in there somewhere.

Numpy uses ratio-of-uniforms, which seems to be the HRUA algorithm.

[ ] NegativeBinomial

[ ] NoncentralChisq

[ ] Poisson

[ ] Skellam

[x] TDist (Fixed in https://github.com/JuliaStats/Distributions.jl/pull/830)

Use Normal/Chisq mixture.

Some of these have been implemented in the src/samplers/ directory. They still need some checking and performance tuning.

The rough idea, if I recall correctly, is to define a sampler method for each distribution that will return a "Sampler" object, which will choose the appropriate algorithm and precompute constants. While this seems to be fairly efficient for large draws from the same distribution, I have noticed problems where the distribution changes on each iteration (as would happen in something like Gibbs sampling), where type instability of the sampler object gives a huge penalty in performance.

I think we can update a few checks in here. NegativeBinomial,poisson and Skellam have been updated. So that leaves us with noncentrals and Hypergeometric right?

JuliaStats / Distributions.jl

Replace Rmath RNGs #294