Julia-Tempering / Pigeons.jl

Sampling from intractable distributions, with support for distributed and parallel methods
https://pigeons.run/dev/
GNU Affero General Public License v3.0
85 stars 10 forks source link

How to deal with log likelihood of -Inf from parameters in support #264

Closed itsdfish closed 2 months ago

itsdfish commented 2 months ago

Hello,

I am experiecing an issue with the way the slice sampler handles a log likelihood of -Inf. I have a model which computes a vector of probabilities using the softmax function and passes that to a multinomial distribution object. One issue is that large inputs into the softmax function result in NaN due to numerical overflow. I handle this by returning -Inf for the logpdf, which triggers the following error:

Error 1

ERROR: SliceSampler supports contrained target, but the sampler should be initialized in the support: DynamicPPL.TypedVarInfo{@NamedTuple{β::DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:β, Setfield.IdentityLens}, Int64}, Vector{Gamma{Float64}}, Vector{AbstractPPL.VarName{:β, Setfield.IdentityLens}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}, γ::DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:γ, Setfield.IdentityLens}, Int64}, Vector{Gamma{Float64}}, Vector{AbstractPPL.VarName{:γ, Setfield.IdentityLens}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}, δ::DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:δ, Setfield.IdentityLens}, Int64}, Vector{Normal{Float64}}, Vector{AbstractPPL.VarName{:δ, Setfield.IdentityLens}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}}, Float64}((β = DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:β, Setfield.IdentityLens}, Int64}, Vector{Gamma{Float64}}, Vector{AbstractPPL.VarName{:β, Setfield.IdentityLens}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}(Dict(β => 1), [β], UnitRange{Int64}[1:1], [1.9656158502359227], Gamma{Float64}[Gamma{Float64}(α=0.2, θ=4.0)], Set{DynamicPPL.Selector}[Set()], [0], Dict{String, BitVector}("del" => [0], "trans" => [1])), γ = DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:γ, Setfield.IdentityLens}, Int64}, Vector{Gamma{Float64}}, Vector{AbstractPPL.VarName{:γ, Setfield.IdentityLens}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}(Dict(γ => 1), [γ], UnitRange{Int64}[1:1], [0.497672447968838], Gamma{Float64}[Gamma{Float64}(α=0.2, θ=4.0)], Set{DynamicPPL.Selector}[Set()], [0], Dict{String, BitVector}("del" => [0], "trans" => [1])), δ = DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:δ, Setfield.IdentityLens}, Int64}, Vector{Normal{Float64}}, Vector{AbstractPPL.VarName{:δ, Setfield.IdentityLens}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}(Dict(δ => 1), [δ], UnitRange{Int64}[1:1], [-1.0817985744648773], Normal{Float64}[Normal{Float64}(μ=0.0, σ=2.0)], Set{DynamicPPL.Selector}[Set()], [0], Dict{String, BitVector}("del" => [0], "trans" => [1]))), Base.RefValue{Float64}(-Inf), Base.RefValue{Int64}(1))
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] cached_log_potential(log_potential::InterpolatedLogPotential{…}, state::DynamicPPL.TypedVarInfo{…}, cached_lp::Float64)
    @ Pigeons ~/.julia/dev/Pigeons/src/explorers/SliceSampler.jl:36
  [3] slice_sample!(h::SliceSampler, state::Vector{…}, log_potential::InterpolatedLogPotential{…}, cached_lp::Float64, replica::Pigeons.Replica{…})
    @ Pigeons ~/.julia/dev/Pigeons/src/explorers/SliceSampler.jl:44
  [4] slice_sample!
    @ ~/.julia/dev/Pigeons/ext/PigeonsDynamicPPLExt/state.jl:67 [inlined]
  [5] step!(explorer::SliceSampler, replica::Pigeons.Replica{DynamicPPL.TypedVarInfo{…}, @NamedTuple{…}}, shared::Shared{...})
    @ Pigeons ~/.julia/dev/Pigeons/src/explorers/SliceSampler.jl:28
  [6] explore!(pt::PT{...}, replica::Pigeons.Replica{DynamicPPL.TypedVarInfo{…}, @NamedTuple{…}}, explorer::SliceSampler)
    @ Pigeons ~/.julia/dev/Pigeons/src/pt/pigeons.jl:107
  [7] explore!
    @ ~/.julia/dev/Pigeons/src/pt/pigeons.jl:96 [inlined]
  [8] macro expansion
    @ ~/.julia/dev/Pigeons/src/pt/pigeons.jl:50 [inlined]
  [9] macro expansion
    @ ./timing.jl:503 [inlined]
 [10] run_one_round!(pt::PT{...})
    @ Pigeons ~/.julia/dev/Pigeons/src/pt/pigeons.jl:49
 [11] pigeons(pt::PT{...})
    @ Pigeons ~/.julia/dev/Pigeons/src/pt/pigeons.jl:18
 [12] pigeons(pt_arguments::Inputs{TuringLogPotential{...}, Nothing, Nothing, Nothing, Nothing}, ::Pigeons.ThisProcess)
    @ Pigeons ~/.julia/dev/Pigeons/src/api.jl:19
 [13] #pigeons#166
    @ ~/.julia/dev/Pigeons/src/api.jl:16 [inlined]
 [14] top-level scope
    @ ~/.julia/dev/sandbox/pigeons_issue/run_pigeons_issue.jl:35
Some type information was truncated. Use `show(err)` to see complete types.

Upon inspection of the sampled parameters, I verified that they were valid. The problem was numerical overflow of the soft max function. I tried two work arounds, which lead to the same error below. My first work around was to contrain the prior on a parameter to minimize the chance of a numerical overflow problem. My second workaround was to return a negative number with a large magnitude instead of -Inf.

Error 2

δ = 0.7285379080643255, γ = 0.025087157296621306, β = 3.7406e-320, logpdf -3.1248093158291352
ERROR: Got an invalid log density after updating state at index 1:
- log density = Inf
- state[1]   = -743.6643604239493
Dumping full replica state:
DynamicPPL.TypedVarInfo{@NamedTuple{β::DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:β, Setfield.IdentityLens}, Int64}, Vector{Gamma{Float64}}, Vector{AbstractPPL.VarName{:β, Setfield.IdentityLens}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}, γ::DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:γ, Setfield.IdentityLens}, Int64}, Vector{Gamma{Float64}}, Vector{AbstractPPL.VarName{:γ, Setfield.IdentityLens}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}, δ::DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:δ, Setfield.IdentityLens}, Int64}, Vector{Normal{Float64}}, Vector{AbstractPPL.VarName{:δ, Setfield.IdentityLens}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}}, Float64}((β = DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:β, Setfield.IdentityLens}, Int64}, Vector{Gamma{Float64}}, Vector{AbstractPPL.VarName{:β, Setfield.IdentityLens}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}(Dict(β => 1), [β], UnitRange{Int64}[1:1], [-743.6643604239493], Gamma{Float64}[Gamma{Float64}(α=0.1, θ=8.0)], Set{DynamicPPL.Selector}[Set()], [0], Dict{String, BitVector}("del" => [0], "trans" => [1])), γ = DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:γ, Setfield.IdentityLens}, Int64}, Vector{Gamma{Float64}}, Vector{AbstractPPL.VarName{:γ, Setfield.IdentityLens}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}(Dict(γ => 1), [γ], UnitRange{Int64}[1:1], [-3.6853992252769974], Gamma{Float64}[Gamma{Float64}(α=0.1, θ=8.0)], Set{DynamicPPL.Selector}[Set()], [0], Dict{String, BitVector}("del" => [0], "trans" => [1])), δ = DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:δ, Setfield.IdentityLens}, Int64}, Vector{Normal{Float64}}, Vector{AbstractPPL.VarName{:δ, Setfield.IdentityLens}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}(Dict(δ => 1), [δ], UnitRange{Int64}[1:1], [0.7285379080643255], Normal{Float64}[Normal{Float64}(μ=0.0, σ=2.0)], Set{DynamicPPL.Selector}[Set()], [0], Dict{String, BitVector}("del" => [0], "trans" => [1]))), Base.RefValue{Float64}(-83.646672847717), Base.RefValue{Int64}(1282))

Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] slice_sample!(h::SliceSampler, state::Vector{…}, log_potential::Pigeons.InterpolatedLogPotential{…}, cached_lp::Float64, replica::Pigeons.Replica{…})
    @ Pigeons ~/.julia/dev/Pigeons/src/explorers/SliceSampler.jl:53
  [3] slice_sample!
    @ ~/.julia/dev/Pigeons/ext/PigeonsDynamicPPLExt/state.jl:67 [inlined]
  [4] step!(explorer::SliceSampler, replica::Pigeons.Replica{…}, shared::Pigeons.Shared{…})
    @ Pigeons ~/.julia/dev/Pigeons/src/explorers/SliceSampler.jl:28
  [5] explore!(pt::PT{…}, replica::Pigeons.Replica{…}, explorer::SliceSampler)
    @ Pigeons ~/.julia/dev/Pigeons/src/pt/pigeons.jl:107
  [6] explore!
    @ ~/.julia/dev/Pigeons/src/pt/pigeons.jl:96 [inlined]
  [7] macro expansion
    @ ~/.julia/dev/Pigeons/src/pt/pigeons.jl:50 [inlined]
  [8] macro expansion
    @ ./timing.jl:503 [inlined]
  [9] run_one_round!(pt::PT{Inputs{…}, Vector{…}, Pigeons.Shared{…}, @NamedTuple{…}})
    @ Pigeons ~/.julia/dev/Pigeons/src/pt/pigeons.jl:49
 [10] pigeons(pt::PT{Inputs{…}, Vector{…}, Pigeons.Shared{…}, @NamedTuple{…}})
    @ Pigeons ~/.julia/dev/Pigeons/src/pt/pigeons.jl:18
 [11] pigeons(pt_arguments::Inputs{TuringLogPotential{…}, Nothing, Nothing, Nothing, Nothing}, ::Pigeons.ThisProcess)
    @ Pigeons ~/.julia/dev/Pigeons/src/api.jl:19
 [12] #pigeons#166
    @ ~/.julia/dev/Pigeons/src/api.jl:16 [inlined]
 [13] top-level scope
    @ ~/.julia/dev/sandbox/pigeons_issue/run_pigeons_issue.jl:35
Some type information was truncated. Use `show(err)` to see complete types.

Do you have any recommendations?

Here is the code I am working with:

Code

using Pigeons
using Random 
using UtilityModels
using Turing

Random.seed!(8741)

choice_sets = [
    Gamble(; p = [0.20, 0.20, 0.60], v = [58, 56, 2]),
    Gamble(; p = [0.20, 0.20, 0.60], v = [96, 4, 2])
]

model = TAX(; 
    δ = .70, 
    β = .60, 
    γ = 0.70, 
    θ = 1.0
)

choices = rand(model, choice_sets, 10)
data = (choice_sets, choices)

@model function tax(data)
    β ~ Gamma(.2, 4)
    γ ~ Gamma(.2, 4)
    δ ~ Normal(0, 2)
    #println("δ = $δ, γ = $γ, β = $β, logpdf $(logpdf(TAX(; δ, γ, β), data))")
    data ~ TAX(; δ, γ, β)
end

pt_tax = pigeons(target=TuringLogPotential(tax(data)), record=[traces])
miguelbiron commented 2 months ago

Hi @itsdfish ! As you correctly identified, the core issue is working in probability space -- instead of logits -- which is not accurate enough for mcmc purposes. Specifically, the issue is here

https://github.com/itsdfish/UtilityModels.jl/blob/06ed53fa5fb15e9f23476c90a08f86c7c7aa1526/src/UtilityModel.jl#L114

Now I understand you do this because Distributions.jl only offers the basic Multinomial implementation. But just as they offer the Bernoullilogit for numerically accurate logistic regression, they should have a multivariate counterpart.

You may have to implement MultinomialLogit or even just CategoricalLogit yourself. You can see how this is used for example in this Stan tutorial

https://mc-stan.org/docs/2_20/stan-users-guide/multi-logit-section.html

itsdfish commented 2 months ago

Thank you for your reply. I will try to implement the MultinomialLogit or CategoricalLogit. Do you know if the CategoricalLogit is sufficient for multiple responses e.g., (data = [3,3,4])? For example, can I multiply the log likelihood of each response by the corresponding value in the data vector, and ignore the multinomial coefficient?

One last question: do you know why my hack did not work? For example, the following should not return Inf or -Inf


function logpdf(model::UtilityModel, gambles::Vector{<:Gamble}, choice_idxs::Vector{<:Int})
    utility = mean.(model, gambles)
    util_exp = exp.(model.θ .* utility)
    T = typeof(model.θ)
    n = sum(choice_idxs)
    p = util_exp ./ sum(util_exp)
    !isprobvec(p) ? (return T(-1e20)) : nothing
    LL = logpdf(Multinomial(n, p), choice_idxs)
    return LL == Inf ? T(1e20) : LL
end

Nonetheless, it returned an error similar to

δ = 0.7285379080643255, γ = 0.025087157296621306, β = 3.7406e-320, logpdf -3.1248093158291352
ERROR: Got an invalid log density after updating state at index 1:
- log density = Inf
- state[1]   = -743.6643604239493

I realize the above hack is not idea, but the error was unexpected. So I thought it would be good to follow up.

Thanks again for your help.

miguelbiron commented 2 months ago

If you only care about inferring the probabilities and not the number of draws then not having the combinatorial factor does not matter. It will however affect the estimate for the log normalization constant given by pigeons.

On the other hand, implemeting the correct value might allow you to submit it as PR to Distributions.jl and then others might be able to help you in polishing the code.

Finally, the error that you see is probably due to the prior and not the likelihood. The fact that the likelihood is flat and nonzero in the whole space outside of a compact set where softmax works will encourage the slicer to go way out into the tails. For example, I imagine that value you see of -743 for the first coordinate is pretty rare under the prior.

itsdfish commented 2 months ago

It will however affect the estimate for the log normalization constant given by pigeons

That was one of my concerns. Thanks again for your help!

itsdfish commented 2 months ago

Hi @miguelbiron. I hope you don't mind a quick follow up question. I am still trying to figure out a way to make my multimial logit logpdf numerically stable. One thing that is still bothering me is that I cannot figure out which parameter values put me into the problematic region. Here is the information I print from logpdf right before the error:

p [0.5, 0.5] parms (-0.9340232427805155, 8.73219841381932e-9, 0.0, 1.0) logpdf -9.092276422448663
ERROR: SliceSampler supports contrained target, but the sampler should be initialized in the support: DynamicPPL.TypedVarInfo{@Named...

The logpdf looks fine, and the parameter values are in the support of both the prior distributions and the likelihood function. For reference, the priors are:

δ ~ Normal(0, 2)
β ~ Gamma(1, 4)
γ ~ Gamma(1, 4)
# the last parameter is fixed

I am having some difficulty parsing the error message below for parameter values. If I understand correctly, delta is [-0.9340232427805155], which matches the print out from logpdf, but gamma is [-18.55624867590815], and beta is [-Inf], which are both outside of the support of the Gamma distribution. Is that a correct intepretation, or are other parameters producing the problem? Thanks in advance for your help.

ERROR: SliceSampler supports contrained target, but the sampler should be initialized in the support: DynamicPPL.TypedVarInfo{@NamedTuple{β::DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:β, typeof(identity)}, Int64}, Vector{Gamma{Float64}}, Vector{AbstractPPL.VarName{:β, typeof(identity)}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}, γ::DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:γ, typeof(identity)}, Int64}, Vector{Gamma{Float64}}, Vector{AbstractPPL.VarName{:γ, typeof(identity)}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}, δ::DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:δ, typeof(identity)}, Int64}, Vector{Normal{Float64}}, Vector{AbstractPPL.VarName{:δ, typeof(identity)}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}}, Float64}((β = DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:β, typeof(identity)}, Int64}, Vector{Gamma{Float64}}, Vector{AbstractPPL.VarName{:β, typeof(identity)}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}(Dict(β => 1), [β], UnitRange{Int64}[1:1], [-Inf], Gamma{Float64}[Gamma{Float64}(α=1.0, θ=4.0)], Set{DynamicPPL.Selector}[Set()], [0], Dict{String, BitVector}("del" => [0], "trans" => [1])), γ = DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:γ, typeof(identity)}, Int64}, Vector{Gamma{Float64}}, Vector{AbstractPPL.VarName{:γ, typeof(identity)}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}(Dict(γ => 1), [γ], UnitRange{Int64}[1:1], [-18.55624867590815], Gamma{Float64}[Gamma{Float64}(α=1.0, θ=4.0)], Set{DynamicPPL.Selector}[Set()], [0], Dict{String, BitVector}("del" => [0], "trans" => [1])), δ = DynamicPPL.Metadata{Dict{AbstractPPL.VarName{:δ, typeof(identity)}, Int64}, Vector{Normal{Float64}}, Vector{AbstractPPL.VarName{:δ, typeof(identity)}}, Vector{Float64}, Vector{Set{DynamicPPL.Selector}}}(Dict(δ => 1), [δ], UnitRange{Int64}[1:1], [-0.9340232427805155], Normal{Float64}[Normal{Float64}(μ=0.0, σ=2.0)], Set{DynamicPPL.Selector}[Set()], [0], Dict{String, BitVector}("del" => [0], "trans" => [1]))), Base.RefValue{Float64}(-Inf), Base.RefValue{Int64}(131))

miguelbiron commented 2 months ago

Hey @itsdfish -- by default, Pigeons works with DynamicPPL models in unconstrained space. That is why you are seeing negative values, because they correspond to the log of the original parameter. That in itself is not a problem.

To help you understand where the issue arises, I have reproduced the error in pigeons. Then I took to offending set of parameters and slightly modified them. The only one that requires modifying to give non Inf values is beta, as the plot below shows

lp

Code to reproduce

```julia using Pigeons using SplittableRandoms using Random using UtilityModels using DynamicPPL Random.seed!(8741) choice_sets = [ Gamble(; p = [0.20, 0.20, 0.60], v = [58, 56, 2]), Gamble(; p = [0.20, 0.20, 0.60], v = [96, 4, 2]) ] model = TAX(; δ = .70, β = .60, γ = 0.70, θ = 1.0 ) choices = rand(model, choice_sets, 10) data = (choice_sets, choices) @model function tax(data) δ ~ Normal(0, 2) β ~ Gamma(1, 4) γ ~ Gamma(1, 4) data ~ TAX(; δ, γ, β) end target=TuringLogPotential(tax(data)) rng = SplittableRandom(1) vi = Pigeons.initialization(target,rng,1) # set to values found to give an error Pigeons.update_state!(vi, :δ, 1, 3.8607330400453557) Pigeons.update_state!(vi, :β, 1, 2.343675578101852) Pigeons.update_state!(vi, :γ, 1, -0.38113052708856787) using Plots βs = range(-1.0,1.0,100) lp_vals = map(βs) do β Pigeons.update_state!(vi, :β, 1, β) target(vi) end plot(βs,lp_vals,label="", xlabel="logβ", ylabel="Log density") ```

itsdfish commented 2 months ago

Excellent. Thank you for clarifying that the parameters are in log space, and thank for the example above and code.