dmetivie / ExpectationMaximization.jl

A simple but generic implementation of Expectation Maximization algorithms to fit mixture models.
https://dmetivie.github.io/ExpectationMaximization.jl/
MIT License
33 stars 1 forks source link

Fitting a beta mixture fails with no method found #9

Open danielinteractive opened 10 months ago

danielinteractive commented 10 months ago

Hi @dmetivie ,

first of all thanks for this Julia package, and apologies if I make some very naive mistakes here, since this is one of my first meddling with Julia ever...

I am trying to fit a mixture of two beta distributions, but it does not find suffstats in this case:

using ExpectationMaximization

# Try to fit a beta mixture.
N = 50_000
α₁ = 10
β₁ = 5
α₂ = 5
β₂ = 10
π = 0.3

# Mixture Model of two betas.
mix_true = MixtureModel([Beta(α₁, β₁), Beta(α₂, β₂)], [π, 1 - π]) 

# Generate N samples from the mixture.
y = rand(mix_true, N)
histogram(y)

# Initial guess.
mix_guess = MixtureModel([Beta(1, 1), Beta(1, 1)], [0.5, 1 - 0.5])
test = rand(mix_guess, N)

# Fit the MLE with the EM algorithm:
mix_mle = fit_mle(mix_guess, y)
# ERROR: suffstats is not implemented for (Beta{Float64}, Vector{Float64}, Vector{Float64}).

My status:

(@v1.9) pkg> status
Status `~/.julia/environments/v1.9/Project.toml`
  [336ed68f] CSV v0.10.11
  [a93c6f00] DataFrames v1.6.1
  [31c24e10] Distributions v0.25.103
  [e1fe09cc] ExpectationMaximization v0.2.2
  [f3b207a7] StatsPlots v0.15.6
  [fce5fe82] Turing v0.29.3
danielinteractive commented 10 months ago

Could we add our own suffstats method to fix this? At least from https://math.stackexchange.com/questions/321952/beta-distribution-sufficient-statistic it seems this should be relatively straightforward?

dmetivie commented 10 months ago

Thanks for your interest in the package.

fit_mle already is defined for Beta distribution in Distributions.jl see function` fit_mle(::Type{<:Beta}, x::AbstractArray{T};.

Error messages can be hard to read at first, in your case the REPL (terminal) complains that fit_mle(Beta, y, w) does not exist, i.e. the weighted version (where each sample in y is given a weight).

  [2] suffstats(::Type{Beta{Float64}}, ::Vector{Float64}, ::Vector{Float64})
    @ Distributions C:\Users\metivier\.julia\packages\Distributions\SUTV1\src\genericfit.jl:5
  [3] fit_mle(dt::Type{Beta{Float64}}, x::Vector{Float64}, w::Vector{Float64})

In classical EM this is what version that is used at each M steps. And in the link above, you can check that it is currently not defined (as opposed to some other distributions, like Exponential).

I don't know about fit_mle(Beta, y, w) (and suffstats). A quick search seems to say it does exist in some R packages and papers. Hence if you manage to code it, you could add it

dmetivie commented 10 months ago

Otherwise you can use the StochasticEM that I implemented (I did some elementary test but I am not sure how robust it is). In the case of StochasticEM only fit_mle(Beta, y) is needed.

using ExpectationMaximization
using Distributions
using Plots
using Random

Random.seed!(1234)

# Try to fit a beta mixture.
N = 50_000
α₁ = 10
β₁ = 5
α₂ = 5
β₂ = 10
π = 0.3

# Mixture Model of two betas.
mix_true = MixtureModel([Beta(α₁, β₁), Beta(α₂, β₂)], [π, 1 - π]) 

# Generate N samples from the mixture.
y = rand(mix_true, N)
stephist(y, label = "true distribution", norm = :pdf)

# Initial guess.
mix_guess = MixtureModel([Beta(1, 1), Beta(1, 1)], [0.5, 1 - 0.5])
test = rand(mix_guess, N)
stephist!(test, label = "guess distribution", norm = :pdf)

# Fit the MLE with the classic EM algorithm:
# mix_mle = fit_mle(mix_guess, y)
# ERROR: suffstats is not implemented for (Beta{Float64}, Vector{Float64}, Vector{Float64}).

mix_mle_SEM = fit_mle(mix_guess, y, method = StochasticEM())
fit_y = rand(mix_mle_SEM, N)
stephist!(fit_y, label = "fitMLE distribution", norm = :pdf)

it works

danielinteractive commented 10 months ago

Awesome, thanks so much @dmetivie ! The stochastic EM works perfectly fine here.

danielinteractive commented 10 months ago

In classical EM this is what version that is used at each M steps.

And in the link above, you can check that it is currently not defined (as opposed to some other distributions, like Exponential).

I don't know about fit_mle(Beta, y, w) (and suffstats). A quick search seems to say it does exist in some R packages and papers. Hence if you manage to code it, you could add it

We might want to look into this soon - when I searched I did not see any obvious R package implementations or papers - can you share maybe what you found?

My guess would be we could do a linearly weighted log likelihood and maximize that.

danielinteractive commented 10 months ago

Hi @dmetivie , so I am happy that I got the weighted likelihood maximization working for the Beta distribution. However, somehow when using that in the backend for the classic EM algorithm it does not work correctly - the Beta mixture is not fitted correctly but the 2 components are identical. Do you have an idea what is going wrong? Thanks a lot!

using Distributions
import Distributions: fit_mle, varm
using SpecialFunctions
using LinearAlgebra

# sufficient statistics - these capture everything of the data we need
struct BetaStats <: SufficientStats
    sum_log_x::Float64 # (weighted) sum of log(x)
    sum_log_1mx::Float64 # (weighted) sum of log(1 - x)
    tw::Float64 # total sample weight
    x_bar::Float64 # (weighted) mean of x
    v_bar::Float64 # (weighted) variance of x
end

function suffstats(::Type{<:Beta}, x::AbstractArray{T}, w::AbstractArray{T}) where T<:Real

    tw = 0.0
    sum_log_x = 0.0 * zero(T)
    sum_log_1mx = 0.0 * zero(T)
    x_bar = 0.0 * zero(T)

    for i in eachindex(x, w)
        @inbounds xi = x[i]
        0 < xi < 1 || throw(DomainError(xi, "samples must be larger than 0 and smaller than 1"))
        @inbounds wi = w[i]
        wi >= 0 || throw(DomainError(wi, "weights must be non-negative"))
        isfinite(wi) || throw(DomainError(wi, "weights must be finite"))
        @inbounds sum_log_x += wi * log(xi)
        @inbounds sum_log_1mx += wi * log(one(T) - xi)
        @inbounds x_bar += wi * xi
        tw += wi
    end
    sum_log_x /= tw   
    sum_log_1mx /= tw
    x_bar /= tw
    v_bar = varm(x, x_bar)

    BetaStats(sum_log_x, sum_log_1mx, tw, x_bar, v_bar)
end

# without weights we just put weight 1 for each observation
function suffstats(::Type{<:Beta}, x::AbstractArray{T}) where T<:Real

    w = ones(Float64, length(x))
    suffstats(Beta, x, w)

end

# generic fit function based on the sufficient statistics, on the log scale to be robust
function fit_mle(::Type{<:Beta}, ss::BetaStats;
    maxiter::Int=1000, tol::Float64=1e-14)

    # Initialization of parameters at the moment estimators (I guess)
    temp = ((ss.x_bar * (1 - ss.x_bar)) / ss.v_bar) - 1
    α₀ = ss.x_bar * temp
    β₀ = (1 - ss.x_bar) * temp

    g₁ = ss.sum_log_x
    g₂ = ss.sum_log_1mx

    θ= [log(α₀) ; log(β₀)]

    converged = false
    t=0
    while !converged && t < maxiter
        t += 1
        α = exp(θ[1])
        β = exp(θ[2])
        temp1 = digamma(α + β)
        temp2 = trigamma(α + β)
        temp3 = g₁ + temp1 - digamma(α)
        grad = [temp3 * α
                (temp1 + g₂ - digamma(β)) * β]
        hess = [((temp2 - trigamma(α)) * α + temp3) * α             temp2 * β * α
                temp2 * α * β       ((temp2 - trigamma(β)) * β + temp1 + g₂ - digamma(β)) * β  ]
        Δθ = hess\grad #newton step
        θ .-= Δθ
        converged = dot(Δθ,Δθ) < 2*tol #stopping criterion
    end

    α = exp(θ[1])
    β = exp(θ[2])
    return Beta(α, β)
end

# then define methods for the original data
fit_mle(::Type{<:Beta}, x::AbstractArray{T}; maxiter::Int=1000, tol::Float64=1e-14) where T<:Real = fit_mle(Beta, suffstats(Beta, x))
fit_mle(::Type{<:Beta}, x::AbstractArray{T}, w::AbstractArray{T}; maxiter::Int=1000, tol::Float64=1e-14) where T<:Real = fit_mle(Beta, suffstats(Beta, x, w))

# now let's try it out:

fit_mle(Beta, xtest)
fit_mle(Beta, xtest, ones(Float64, 8))
Distributions.fit(Beta, xtest)
wtest = rand(Uniform(), 8)
fit_mle(Beta, xtest, wtest)

# Now finally having this in place the classic EM algorithm should work:
using ExpectationMaximization
using Distributions
using Plots
using Random

Random.seed!(1234)

# Try to fit a beta mixture.
N = 50_000
α₁ = 10
β₁ = 5
α₂ = 5
β₂ = 10
π = 0.3

# Mixture Model of two betas.
mix_true = MixtureModel([Beta(α₁, β₁), Beta(α₂, β₂)], [π, 1 - π]) 

# Generate N samples from the mixture.
y = rand(mix_true, N)
stephist(y, label = "true distribution", norm = :pdf)

# Initial guess.
mix_guess = MixtureModel([Beta(1, 1), Beta(1, 1)], [0.5, 1 - 0.5])
test = rand(mix_guess, N)
stephist!(test, label = "guess distribution", norm = :pdf)

# Fit the MLE with the classic EM algorithm:
mix_mle = fit_mle(mix_guess, y)
fit_y = rand(mix_mle, N)
stephist!(fit_y, label = "fitMLE distribution", norm = :pdf)
# This does not seem to work? Why?

# Test first if we can downweight large values and get expected result.
test_weights = Float64.(ifelse.(y .> 0.5, 0.01, 1))
low_res = fit_mle(Beta, y, test_weights)

fit_low_res = rand(low_res, N)
stephist!(fit_low_res, label = "fit_low_res", norm = :pdf)

# Same for high values
test_weights2 = Float64.(ifelse.(y .< 0.5, 0.01, 1))
high_res = fit_mle(Beta, y, test_weights2)

fit_high_res = rand(high_res, N)
stephist!(fit_high_res, label = "fit_high_res", norm = :pdf)
# So that seems to work fine.
danielinteractive commented 10 months ago

Just tested again in a fresh folder, and the problem can be boiled down to this plot: image The green histogram is the fit from fit_mle() applied with a 2 components mixture and the data coming from the blue, true 2 components mixture. It is the same beta distribution twice:

MixtureModel{Beta{Float64}}(K = 2)
components[1] (prior = 0.5000): Beta{Float64}(α=2.582748230188013, β=3.3169631350299884)
components[2] (prior = 0.5000): Beta{Float64}(α=2.582748230188013, β=3.3169631350299884)

On the other hand, downweighting the observations above 0.5 a lot and running fit_mle() with a single beta distribution yields the fit_low_res i.e. pink distribution which makes sense, and vice versa for fit_high_res i.e. the light brown distribution. @dmetivie any hints are much appreciated :)

dmetivie commented 10 months ago

Sorry for delay I was busy with other stuff!

What you observed is actually a common pitfall of the EM algo. With the poor mix_guess initial value the EM get stuck into some poor local maximal for the likelihood. The Stochastic version actually managed to escape that!

You can check a better initial condition like

mix_guess_2 = MixtureModel([Beta(10, 3), Beta(5, 12)], [0.3, 1 - 0.3])

converges to the true distribution.

With the kwarg infos = true you can extract the loglikelihood of the algo and select the best one for different initialization. I made a something to facilitate that, just pass a vector of mix_guess. It will choose the one with the best likelihood (+ it should suppress convergence failure, see #11 issue)

fit_mle([mix_gess, mix_guess_2], y)

I should update documentation and readme.

danielinteractive commented 10 months ago

Thanks @dmetivie , that is super helpful! In that case, I guess I could also even pass mixtures with different number of components in the vector to fit_mle()? And it will pick the one with the highest log-likelihood? (hm, although then we might overfit I guess, if we don't use BIC or something else that penalizes the number of parameters)

dmetivie commented 10 months ago

Yes indeed, in that case bic aic etc should be used. One should define something like bic(::MixtureModel,y). If that is defined then it could be an option in the selection of the model in fit_mle