SciRuby / sciruby

Tools for scientific computation in Ruby
http://gems.sciruby.com
Other
995 stars 80 forks source link

Complete set of common probability distributions #5

Closed translunar closed 9 years ago

translunar commented 13 years ago

Claudio Bustos' distribution gem supplies the probability distributions for SciRuby.

Some of these have already been implemented (e.g., normal, chisquare, hypergeometric, logistic, F, exponential, binomial, bivariate normal, Poisson, Student's t, beta, gamma).

Others have not. For example, multivariate normal and lognormal are both needed.

See a list of already-implemented distributions. Make sure to look at existing distributions for a template. The goal is to eventually implement each in Java, pure Ruby, and C (i.e., GSL or statistics2).

One difficulty is license compatibility. If code is GPLed, it cannot go directly into SciRuby. Claudio's distribution gem is currently under the GPL, mainly because some of the distributions are derived from GSL code (which is itself GPL). It would be best to rewrite those distributions (eventually) based on academic papers or other material that isn't subject to the GPL, because we want to be moving toward BSD/MIT compatibility.

This goes for new distributions as well. If you can only get code from GSL, see if you can reach out to the original author of the code in question. Find out how he or she would feel about us incorporating it into SciRuby under a more liberal license. Please document any conversations you may have, particularly if you're able to reach an author and he or she gives permission.

One idea for finding a list of common probability distributions: seach arxiv.org for usage of the names of distributions, like so: https://www.google.com/search?q=site%3Aarxiv.org+%22gamma+distribution%22

Gamma distribution is found 2500 times, but Poisson distribution is found over 10,000 times. You could use this to get an idea which distributions are most utilized.

dennyabraham commented 13 years ago

I went through the remaining distributions available in gsl not yet implemented in distributions and counted their arxiv.org mentions, including synonymous names where possible. This is ranked based on number of mentions. Some distributions are listed multiple times when using synonymous names yielded many many more results, particularly when synonymous names are generic terms (ex: uniform distribution).

Arxiv Mentions Names Other Names Notes
590Flat (Uniform) DistributionIncluding 'Uniform Distribution'
370Lognormal DistributionGalton Distribution
174Laplace DistributionGumbel Distribution, Double Exponential Distribution
125Levy alpha-Stable DistributionsStable DistributionIncluding 'Stable Distribution'
85Levy skew alpha-Stable DistributionLevy Distribution, Van der Waals profileIncluding 'Levy Distribution'
74 Geometric Distribution
66 Pareto DistributionBradford Distribution
64 Negative Binomial Distribution
63 Flat (Uniform) DistributionNot Including 'Uniform Distribution'
61 Weibull Distribution
47 Dirichlet Distribution
39 Cauchy DistributionLorentz Distribution, Breit–Wigner Distribution
31 Bernoulli Distribution
31 Multinomial Distribution
26 Rayleigh Distribution
18 Logarithmic Distributionlogarithmic series distribution, log-series distribution
7 Exponential Power DistributionGeneralized Gaussian Distribution, Generalized Normal Distribution
6 Landau Distribution
4 Logistic Distribution
2 Gaussian Tail Distribution
2 General Discrete Distributions
2 Pascal Distribution
2 Levy alpha-Stable DistributionsStable DistributionNot Including 'Stable Distribution'
1 Levy skew alpha-Stable DistributionLevy Distribution, Van der Waals profileNot Including 'Levy Distribution'
0 Rayleigh Tail Distribution
0 Spherical Vector Distributions
0 Type-1 Gumbel Distribution
0 Type-2 Gumbel Distribution
lstrzebinczyk commented 12 years ago

Is there need for anything more than density function, distribution function, characteristic function, some params for every distribution and maybe plot? If not, it shouldnt be too hard to implement everything from scratch.

boutil commented 12 years ago

The Stan project: http://mc-stan.org/ has a BSD-3-clause license and has a certain number of built-in probability distributions which are on the list, with expressions given explicitly in the documentation: http://stan.googlecode.com/files/stan-reference-1.0.2.pdf.

I guess that the argument of the license compatibility holds only for those distribution with no explicit density function, and thus for which some particular algorithm is needed. Otherwise, it seems difficult to believe that the GPL of GSL covers also the mathematical expression describing the distribution.

translunar commented 12 years ago

Ahh, thanks. This is helpful.

Unfortunately, licenses cover the approximations used for various functions. You nearly always need to use an approximation.

boris-s commented 11 years ago

Categorical distribution is missing from the list. It's not such an easy problem to just omit.

vpereira commented 11 years ago

related with this theme, I started to write something with jruby and commons, you can find it here https://github.com/vpereira/distribution/tree/jruby_support (warning, probably it isn't working, but you can read the code and see what I'm trying to do :)). I do support as well GSL, but I'm willing to remove the whole MRI support and just work on top of jruby. There are some really good java library and wiith the license that we need jscience (BSD) and Commons (apache). Beside it the GSL ruby support isn't complete and well, GPL isn't the way to go.