JuliaStats / Distributions.jl

A Julia package for probability distributions and associated functions.
Other
1.1k stars 414 forks source link

Inspect all distributions in Distributions.jl #1116

Open azev77 opened 4 years ago

azev77 commented 4 years ago

Currently there is no convenient way to find all parametric distributions of a certain type. So I did it by hand (+ subtypes(Distribution))

First find all distributions:

using Distributions, LinearAlgebra, TableView, DataFrames;
UniDis =[
    Bernoulli(),
    BetaBinomial(2,1,1),
    Binomial(),
    Categorical([.2; .8]),
    DiscreteNonParametric([1;2], [.1;.9]),
    DiscreteUniform(),
    Geometric(),
    Hypergeometric(1,1,1),
    NegativeBinomial(),
    #NoncentralHypergeometric(1,1,1,1,1), #Abstract Type.
    FisherNoncentralHypergeometric(1,1,1,1), 
    WalleniusNoncentralHypergeometric(1,1,1,1),
    Poisson(),
    PoissonBinomial([.2; .8]), #Cleanup
    Skellam(),
]
#
UniCts =[
    Arcsine(),
    Beta(),
    BetaPrime(),
    Biweight(),
    Cauchy(),
    Chernoff(),
    Chi(1),
    Chisq(1),
    Cosine(),
    Epanechnikov(),
    Erlang(),
    Exponential(),
    FDist(1,1),
    Frechet(),
    Gamma(),
    GeneralizedExtremeValue(1,1,1),
    GeneralizedPareto(),
    Gumbel(),      #AKA DoubleExponential(),
    InverseGamma(),
    InverseGaussian(),
    Kolmogorov(),
    KSDist(1),
    KSOneSided(1),
    Laplace(),
    Levy(),
    LocationScale(1,1,Beta()), #LocationScale(μ,σ,ρ::Distribution)
    Logistic(),
    LogitNormal(),
    LogNormal(),
    NoncentralBeta(1,1,1),
    NoncentralChisq(1,1),
    NoncentralF(1,1,1),
    NoncentralT(1,1),
    Normal(), NormalCanon(),
    NormalInverseGaussian(1,1,1,1),
    Pareto(),
    PGeneralizedGaussian(),
    Rayleigh(),
    Semicircle(1),
    StudentizedRange(1,2),
    SymTriangularDist(),
    TDist(1),
    TriangularDist(1,1),
    Triweight(),
    Uniform(),
    VonMises(),
    Weibull()
];
MultiDis =[
    Dirichlet([1]),
    DirichletMultinomial(1, [1;2]),
    Multinomial(1,1),
    MvLogNormal([1]),
    MvNormal([1]), MvNormalCanon([1]),
    MvTDist(1., [1., 2], [4. 2; 2 3]), #GenericMvTDist(df, μ, C) #MvTDist #DiagTDist #IsoTDist
    Product(Uniform.(rand(2), 1)), #Product of N indep Uni 1-dim dist
    VonMisesFisher([1], 1.) #Mean Direction, meandir(d). No mean.
];
MatrixDis =[
    InverseWishart(1, Matrix{Float64}(1.0I, 1, 1)),
    LKJ(1,1),
    MatrixBeta(1,1,1),
    MatrixFDist(1,1, Matrix{Float64}(1.0I, 1, 1)),
    MatrixNormal(1,1),
    MatrixTDist(1., Matrix{Float64}(1.0I, 1, 1), Matrix{Float64}(1.0I, 1, 1), Matrix{Float64}(1.0I, 1, 1)),
    Wishart(1,Matrix{Float64}(1.0I, 1, 1))
];
MixtureDis =[
    MixtureModel(Normal, [(0.0, 1.0), (2.0, 1.0), (-4.0, 1.5)], [0.2, 0.5, 0.3]),
    UnivariateGMM(map(Dual, [0.0, 2.0, -4.0]), map(Dual, [1.0, 1.2, 1.5]), Categorical(map(Dual, [0.2, 0.5, 0.3])))
];
TruncDis =[
    truncated(Exponential(), -3.0, Inf),
    truncated(Normal(0,1),100,115), #TruncatedNormal(mu, sigma, l, u)
    truncated(Uniform(),0.2,0.95)
]
D_all = [UniDis ∪ UniCts ∪ MultiDis ∪ MatrixDis ∪ TruncDis...]

Now get some properties of each distribution:

StatV = [Distributions.distrname, mean,std,entropy];
s=[try f(D) catch end for D ∈ D_all, f ∈ StatV]
df = DataFrame(D=s[:,1], Mean=s[:,2], Var=s[:,3], Ent=s[:,4] )
showtable(df)

Looks something like: image

This can be a valuable test. For example, I found NoncentralHypergeometric.jl doesn't work...

Update: NoncentralHypergeometric is an abstract type. I added FisherNoncentralHypergeometric and WalleniusNoncentralHypergeometric now there are 83 parametric distributions. Problem is: subtypes(DiscreteUnivariateDistribution) includes: NoncentralHypergeometric But filter(!isabstracttype, subtypes(DiscreteUnivariateDistribution)) does not include FisherNoncentralHypergeometric and WalleniusNoncentralHypergeometric.

Someone on Discourse recommended

function concrete_subtypes(x::Type)::Vector{Type}
    s = subtypes(x)
    sort!(vcat(
    filter(!isabstracttype, s),
    concrete_subtypes.(filter(isabstracttype, s))...),
    by=string)
end

but concrete_subtypes(DiscreteUnivariateDistribution) also seems to miss stuff...

johnczito commented 4 years ago

Something like NoncentralHypergeometric(1,1,1,1,1) shouldn't work because NoncentralHypergeometric is an abstract type. The actual concrete subtypes are FisherNoncentralHypergeometric and WalleniusNoncentralHypergeometric (whatever those are), and they seem to work well enough. And they're included in the unit tests.

azev77 commented 4 years ago

In June I posted code on Discourse to fit all relevant Distributions:

using Distributions, Random, HypothesisTests;

Uni = subtypes(UnivariateDistribution)
#Cts_Uni = subtypes(ContinuousUnivariateDistribution)
DGP_True = LogNormal(17,7);
Random.seed!(123);
const d_train = rand(DGP_True, 1_000)
const d_test  = rand(DGP_True, 1_000)

Er =[]; D_fit  =[];
for d in Uni
    println(d)
    try
        dd = "$(d)"   |> Meta.parse |> eval
        D̂ = fit(dd, d_train)
        Score = [loglikelihood(D̂, d_test),
                OneSampleADTest(d_test, D̂)            |> pvalue,
                ApproximateOneSampleKSTest(d_test, D̂) |> pvalue,
                ExactOneSampleKSTest(d_test, D̂)       |> pvalue,
                #PowerDivergenceTest(d_test,lambda=1)  Not working!!!
                JarqueBeraTest(d_test)                |> pvalue   #Only Normal 
        ];
        #Score = loglikelihood(D̂, ds) #TODO: compute a better score.
        push!(D_fit, [d, D̂, Score])
    catch e
        println(e, d)
        push!(Er, (d,e))
    end
end

a=hcat(D_fit...)
M_names =  a[1,:]; M_fit   =  a[2,:]; M_scores = a[3,:];
idx =sortperm(M_scores, rev=true);
Dfit_sort=hcat(M_names[idx], sort(M_scores, rev=true) )

It turns out Mathematica does this too w/ FindDistribution[ ]. image

One difference is Mathematica uses MLE & 4 matching methods (which match sample statistics w/ population statistics): image