itsdfish commented 3 months ago

Hello,

For many models, MLE and MAP return results for local rather than global maxima. Sometimes switching optimizers, or tweaking optimizer options can help. In my experience, a simple solution is to run the optimizer multiple times and select the maximum from the various attempts. Would it be possible to add a second method for MLE and MAP, which allows users to specify the number of attemps? For example,

function maximum_likelihood(model::DynamicPPL.Model, n_reps::Integer, args...; kwargs...)
    best_lp = -Inf 
    mle = estimate_mode(model, MLE(), args...; kwargs...)
    for i in 2:n_reps 
        _mle = estimate_mode(model, MLE(), args...; kwargs...)
        mle = _mle.lp > best_lp ? _mle : mle 
    end
    return mle 
end

Here is an example of the problem using a fairly simple model:

Example

using SequentialSamplingModels
using Random 
using Turing 

Random.seed!(100)

n_samples = 50
rts = rand(ShiftedLogNormal(ν=-1, σ=.8, τ=.3), n_samples)

@model function model(rts; min_rt = minimum(rts))
    ν ~ Normal(-1, 2)
    σ ~ truncated(Normal(.8, 2), 0, Inf)
    τ ~ Uniform(0, min_rt)
    rts ~ ShiftedLogNormal(ν, σ, τ)
    return (;ν, σ, τ)
end

lb = [-1,0,0]
ub = [10, 10, minimum(rts)]

# Generate a MLE estimate.
lps = map(_ -> maximum_likelihood(model(rts); lb, ub).lp, 1:10)

# Generate a MAP estimate.
map_estimate = maximum_a_posteriori(model(rts); lb, ub)

Results

10-element Vector{Float64}:
 -12.226295752323539
 -12.226295752292035
 -80.44912241263312
 -12.226295752293689
 -64.40215460329794
 -12.22629575228706
 -12.226295752286504
 -84.43334167030277
 -76.50164720237467
 -55.60874854749243

As you can see, half of the time the algorithm landed on a local maxima.

If this is something you are willing to support, I can submit a PR.

yebai commented 3 months ago

We could perhaps add a multiple-try argument to maximum_a_posterior and maximum_likelihood.

itsdfish commented 3 months ago

Thank you for your reply. If my proposal for maximum_a_posterior and maximum_likelihood is acceptable, I can submit a PR.

mhauru commented 3 months ago

Thanks for the proposal @itsdfish! I don't see a harm in adding it as an optional (keyword?) arg with a default of 1. It won't solve the local maximum problem in general, but it does sometimes help.

itsdfish commented 3 months ago

I added the keyword for multiple trys and the unit tests pass. I would like to add an example illustrating the new keyword works for the example above. I wanted to emulate the shifted log normal without defining a new type to make it more lightweight. Is this possible?

Here was my attempt

Code

using Distributions 
using Random 
using Turing 

Random.seed!(50)

y = rand(LogNormal(-1, 1), 50) .+ .3

@model function lognormal(y, min_obs = minimum(y))
    μ ~ Normal(-1, 2)
    σ ~ truncated(Normal(.8, 2), 0, Inf)
    τ ~ Uniform(0, min_obs)
    _y = y .- τ
    #println(typeof(_y))
    _y ~ LogNormal(μ, σ)
end

lb = [-10,0, 0]
ub = [10, 10, minimum(y)]

maximum_likelihood(lognormal(y); lb, ub)

It returns the following error:

Error

ERROR: BoundsError: attempt to access 3-element Vector{Float64} at index [4:4]
Stacktrace:
  [1] throw_boundserror(A::Vector{Float64}, I::Tuple{UnitRange{Int64}})
    @ Base ./abstractarray.jl:737
  [2] checkbounds
    @ ./abstractarray.jl:702 [inlined]
  [3] getindex
    @ ./array.jl:973 [inlined]
  [4] macro expansion
    @ ~/.julia/packages/DynamicPPL/ACaKr/src/varinfo.jl:0 [inlined]
  [5] newmetadata(metadata::@NamedTuple{…}, ::Val{…}, x::Vector{…})
    @ DynamicPPL ~/.julia/packages/DynamicPPL/ACaKr/src/varinfo.jl:158
  [6] VarInfo
    @ ~/.julia/packages/DynamicPPL/ACaKr/src/varinfo.jl:116 [inlined]
  [7] unflatten
    @ ~/.julia/packages/DynamicPPL/ACaKr/src/varinfo.jl:137 [inlined]
  [8] unflatten(vi::DynamicPPL.TypedVarInfo{@NamedTuple{…}, Float64}, x::Vector{Float64})
    @ DynamicPPL ~/.julia/packages/DynamicPPL/ACaKr/src/varinfo.jl:134
  [9] estimate_mode(model::DynamicPPL.Model{…}, estimator::MLE, solver::Nothing; initial_params::Nothing, adtype::AutoForwardDiff{…}, cons::Nothing, lcons::Nothing, ucons::Nothing, lb::Vector{…}, ub::Vector{…}, kwargs::@Kwargs{})
    @ Turing.Optimisation ~/.julia/packages/Turing/duwEY/src/optimisation/Optimisation.jl:485
 [10] estimate_mode (repeats 2 times)
    @ ~/.julia/packages/Turing/duwEY/src/optimisation/Optimisation.jl:456 [inlined]
 [11] #maximum_likelihood#13
    @ ~/.julia/packages/Turing/duwEY/src/optimisation/Optimisation.jl:536 [inlined]
 [12] top-level scope
    @ ~/.julia/dev/sandbox/turing_predict/example.jl:26
Some type information was truncated. Use `show(err)` to see complete types.

mhauru commented 2 months ago

I think what happens is that Turing gets confused as to whether _y is observed data or parameters to be optimised. Not sure if there's a more general fix that could be implemented in Turing, or a better error that could be raised at least, but you can fix your example by replacing the last two lines of the model with y ~ LogNormal(μ, σ) .+ τ.

itsdfish commented 2 months ago

I'm closing this issue because this functionality is built in MultistartOptimization.jl.

TuringLang / Turing.jl

Make MLE and MAP more robust #2279

Example

Results