Open danielinteractive opened 10 months ago
Could we add our own suffstats
method to fix this?
At least from https://math.stackexchange.com/questions/321952/beta-distribution-sufficient-statistic it seems this should be relatively straightforward?
Thanks for your interest in the package.
fit_mle
already is defined for Beta
distribution in Distributions.jl
see function` fit_mle(::Type{<:Beta}, x::AbstractArray{T};.
Error messages can be hard to read at first, in your case the REPL (terminal) complains that fit_mle(Beta, y, w)
does not exist, i.e. the weighted version (where each sample in y is given a weight).
[2] suffstats(::Type{Beta{Float64}}, ::Vector{Float64}, ::Vector{Float64})
@ Distributions C:\Users\metivier\.julia\packages\Distributions\SUTV1\src\genericfit.jl:5
[3] fit_mle(dt::Type{Beta{Float64}}, x::Vector{Float64}, w::Vector{Float64})
In classical EM this is what version that is used at each M
steps.
And in the link above, you can check that it is currently not defined (as opposed to some other distributions, like Exponential
).
I don't know about fit_mle(Beta, y, w)
(and suffstats
). A quick search seems to say it does exist in some R
packages and papers. Hence if you manage to code it, you could add it
Distributions.jl
-> open source contribution! ExpectationMaximization.jl
will work.Otherwise you can use the StochasticEM
that I implemented (I did some elementary test but I am not sure how robust it is).
In the case of StochasticEM
only fit_mle(Beta, y)
is needed.
using ExpectationMaximization
using Distributions
using Plots
using Random
Random.seed!(1234)
# Try to fit a beta mixture.
N = 50_000
α₁ = 10
β₁ = 5
α₂ = 5
β₂ = 10
π = 0.3
# Mixture Model of two betas.
mix_true = MixtureModel([Beta(α₁, β₁), Beta(α₂, β₂)], [π, 1 - π])
# Generate N samples from the mixture.
y = rand(mix_true, N)
stephist(y, label = "true distribution", norm = :pdf)
# Initial guess.
mix_guess = MixtureModel([Beta(1, 1), Beta(1, 1)], [0.5, 1 - 0.5])
test = rand(mix_guess, N)
stephist!(test, label = "guess distribution", norm = :pdf)
# Fit the MLE with the classic EM algorithm:
# mix_mle = fit_mle(mix_guess, y)
# ERROR: suffstats is not implemented for (Beta{Float64}, Vector{Float64}, Vector{Float64}).
mix_mle_SEM = fit_mle(mix_guess, y, method = StochasticEM())
fit_y = rand(mix_mle_SEM, N)
stephist!(fit_y, label = "fitMLE distribution", norm = :pdf)
it works
Awesome, thanks so much @dmetivie ! The stochastic EM works perfectly fine here.
In classical EM this is what version that is used at each
M
steps.And in the link above, you can check that it is currently not defined (as opposed to some other distributions, like
Exponential
).I don't know about
fit_mle(Beta, y, w)
(andsuffstats
). A quick search seems to say it does exist in someR
packages and papers. Hence if you manage to code it, you could add it
We might want to look into this soon - when I searched I did not see any obvious R package implementations or papers - can you share maybe what you found?
My guess would be we could do a linearly weighted log likelihood and maximize that.
Hi @dmetivie , so I am happy that I got the weighted likelihood maximization working for the Beta distribution. However, somehow when using that in the backend for the classic EM algorithm it does not work correctly - the Beta mixture is not fitted correctly but the 2 components are identical. Do you have an idea what is going wrong? Thanks a lot!
using Distributions
import Distributions: fit_mle, varm
using SpecialFunctions
using LinearAlgebra
# sufficient statistics - these capture everything of the data we need
struct BetaStats <: SufficientStats
sum_log_x::Float64 # (weighted) sum of log(x)
sum_log_1mx::Float64 # (weighted) sum of log(1 - x)
tw::Float64 # total sample weight
x_bar::Float64 # (weighted) mean of x
v_bar::Float64 # (weighted) variance of x
end
function suffstats(::Type{<:Beta}, x::AbstractArray{T}, w::AbstractArray{T}) where T<:Real
tw = 0.0
sum_log_x = 0.0 * zero(T)
sum_log_1mx = 0.0 * zero(T)
x_bar = 0.0 * zero(T)
for i in eachindex(x, w)
@inbounds xi = x[i]
0 < xi < 1 || throw(DomainError(xi, "samples must be larger than 0 and smaller than 1"))
@inbounds wi = w[i]
wi >= 0 || throw(DomainError(wi, "weights must be non-negative"))
isfinite(wi) || throw(DomainError(wi, "weights must be finite"))
@inbounds sum_log_x += wi * log(xi)
@inbounds sum_log_1mx += wi * log(one(T) - xi)
@inbounds x_bar += wi * xi
tw += wi
end
sum_log_x /= tw
sum_log_1mx /= tw
x_bar /= tw
v_bar = varm(x, x_bar)
BetaStats(sum_log_x, sum_log_1mx, tw, x_bar, v_bar)
end
# without weights we just put weight 1 for each observation
function suffstats(::Type{<:Beta}, x::AbstractArray{T}) where T<:Real
w = ones(Float64, length(x))
suffstats(Beta, x, w)
end
# generic fit function based on the sufficient statistics, on the log scale to be robust
function fit_mle(::Type{<:Beta}, ss::BetaStats;
maxiter::Int=1000, tol::Float64=1e-14)
# Initialization of parameters at the moment estimators (I guess)
temp = ((ss.x_bar * (1 - ss.x_bar)) / ss.v_bar) - 1
α₀ = ss.x_bar * temp
β₀ = (1 - ss.x_bar) * temp
g₁ = ss.sum_log_x
g₂ = ss.sum_log_1mx
θ= [log(α₀) ; log(β₀)]
converged = false
t=0
while !converged && t < maxiter
t += 1
α = exp(θ[1])
β = exp(θ[2])
temp1 = digamma(α + β)
temp2 = trigamma(α + β)
temp3 = g₁ + temp1 - digamma(α)
grad = [temp3 * α
(temp1 + g₂ - digamma(β)) * β]
hess = [((temp2 - trigamma(α)) * α + temp3) * α temp2 * β * α
temp2 * α * β ((temp2 - trigamma(β)) * β + temp1 + g₂ - digamma(β)) * β ]
Δθ = hess\grad #newton step
θ .-= Δθ
converged = dot(Δθ,Δθ) < 2*tol #stopping criterion
end
α = exp(θ[1])
β = exp(θ[2])
return Beta(α, β)
end
# then define methods for the original data
fit_mle(::Type{<:Beta}, x::AbstractArray{T}; maxiter::Int=1000, tol::Float64=1e-14) where T<:Real = fit_mle(Beta, suffstats(Beta, x))
fit_mle(::Type{<:Beta}, x::AbstractArray{T}, w::AbstractArray{T}; maxiter::Int=1000, tol::Float64=1e-14) where T<:Real = fit_mle(Beta, suffstats(Beta, x, w))
# now let's try it out:
fit_mle(Beta, xtest)
fit_mle(Beta, xtest, ones(Float64, 8))
Distributions.fit(Beta, xtest)
wtest = rand(Uniform(), 8)
fit_mle(Beta, xtest, wtest)
# Now finally having this in place the classic EM algorithm should work:
using ExpectationMaximization
using Distributions
using Plots
using Random
Random.seed!(1234)
# Try to fit a beta mixture.
N = 50_000
α₁ = 10
β₁ = 5
α₂ = 5
β₂ = 10
π = 0.3
# Mixture Model of two betas.
mix_true = MixtureModel([Beta(α₁, β₁), Beta(α₂, β₂)], [π, 1 - π])
# Generate N samples from the mixture.
y = rand(mix_true, N)
stephist(y, label = "true distribution", norm = :pdf)
# Initial guess.
mix_guess = MixtureModel([Beta(1, 1), Beta(1, 1)], [0.5, 1 - 0.5])
test = rand(mix_guess, N)
stephist!(test, label = "guess distribution", norm = :pdf)
# Fit the MLE with the classic EM algorithm:
mix_mle = fit_mle(mix_guess, y)
fit_y = rand(mix_mle, N)
stephist!(fit_y, label = "fitMLE distribution", norm = :pdf)
# This does not seem to work? Why?
# Test first if we can downweight large values and get expected result.
test_weights = Float64.(ifelse.(y .> 0.5, 0.01, 1))
low_res = fit_mle(Beta, y, test_weights)
fit_low_res = rand(low_res, N)
stephist!(fit_low_res, label = "fit_low_res", norm = :pdf)
# Same for high values
test_weights2 = Float64.(ifelse.(y .< 0.5, 0.01, 1))
high_res = fit_mle(Beta, y, test_weights2)
fit_high_res = rand(high_res, N)
stephist!(fit_high_res, label = "fit_high_res", norm = :pdf)
# So that seems to work fine.
Just tested again in a fresh folder, and the problem can be boiled down to this plot:
The green histogram is the fit from fit_mle()
applied with a 2 components mixture and the data coming from the blue, true 2 components mixture. It is the same beta distribution twice:
MixtureModel{Beta{Float64}}(K = 2)
components[1] (prior = 0.5000): Beta{Float64}(α=2.582748230188013, β=3.3169631350299884)
components[2] (prior = 0.5000): Beta{Float64}(α=2.582748230188013, β=3.3169631350299884)
On the other hand, downweighting the observations above 0.5 a lot and running fit_mle()
with a single beta distribution yields the fit_low_res
i.e. pink distribution which makes sense, and vice versa for fit_high_res
i.e. the light brown distribution.
@dmetivie any hints are much appreciated :)
Sorry for delay I was busy with other stuff!
What you observed is actually a common pitfall of the EM algo. With the poor mix_guess
initial value the EM get stuck into some poor local maximal for the likelihood. The Stochastic version actually managed to escape that!
You can check a better initial condition like
mix_guess_2 = MixtureModel([Beta(10, 3), Beta(5, 12)], [0.3, 1 - 0.3])
converges to the true distribution.
With the kwarg infos = true
you can extract the loglikelihood of the algo and select the best one for different initialization. I made a something to facilitate that, just pass a vector of mix_guess
. It will choose the one with the best likelihood (+ it should suppress convergence failure, see #11 issue)
fit_mle([mix_gess, mix_guess_2], y)
I should update documentation and readme.
Thanks @dmetivie , that is super helpful!
In that case, I guess I could also even pass mixtures with different number of components in the vector to fit_mle()
? And it will pick the one with the highest log-likelihood? (hm, although then we might overfit I guess, if we don't use BIC or something else that penalizes the number of parameters)
Yes indeed, in that case bic aic etc should be used. One should define something like bic(::MixtureModel,y)
.
If that is defined then it could be an option in the selection of the model in fit_mle
Hi @dmetivie ,
first of all thanks for this Julia package, and apologies if I make some very naive mistakes here, since this is one of my first meddling with Julia ever...
I am trying to fit a mixture of two beta distributions, but it does not find
suffstats
in this case:My status: