TuringLang / Turing.jl

Bayesian inference with probabilistic programming.
https://turinglang.org
MIT License
2.02k stars 218 forks source link

Cannot sample from bounded distribution when using Dual types #2088

Open itsdfish opened 11 months ago

itsdfish commented 11 months ago

Hi all,

I believe I found a bug while trying to generate posterior predictive distributions. As shown below, it is possible to sample from distributions unless they are bounded and more than one sample is drawn. The error occurs with other bounded distributions, such as Gamma. It looks like it could be an issue with either Turing or Distributions. Sorry if I filed to the wrong package.

works

using Turing 

@model function test_model()
    μ ~ Gamma(2, 2)
    σ ~ Gamma(2, 2)
    return rand(Normal(μ, σ))
end

chain = sample(test_model(), NUTS(1000, .85), MCMCThreads(), 1000, 1)

works

using Turing 

@model function test_model()
    μ ~ Gamma(2, 2)
    σ ~ Gamma(2, 2)
    return rand(Normal(μ, σ), 1)
end

chain = sample(test_model(), NUTS(1000, .85), MCMCThreads(), 1000, 1)

works

using Turing 

@model function test_model()
    μ ~ Gamma(2, 2)
    σ ~ Gamma(2, 2)
    return rand(InverseGaussian(μ, σ))
end

chain = sample(test_model(), NUTS(1000, .85), MCMCThreads(), 1000, 1)

Fails

using Turing 

@model function test_model()
    μ ~ Gamma(2, 2)
    σ ~ Gamma(2, 2)
    return rand(InverseGaussian(μ, σ), 1)
end

chain = sample(test_model(), NUTS(1000, .85), MCMCThreads(), 1000, 1)

truncated error

ERROR: TaskFailedException

    nested task error: TaskFailedException

        nested task error: MethodError: no method matching Float64(::ForwardDiff.Dual{ForwardDiff.Tag{Turing.TuringTag, Float64}, Float64, 2})

        Closest candidates are:
          (::Type{T})(::Real, ::RoundingMode) where T<:AbstractFloat
           @ Base rounding.jl:207
          (::Type{T})(::T) where T<:Number
           @ Core boot.jl:792
          (::Type{T})(::AbstractChar) where T<:Union{AbstractChar, Number}
           @ Base char.jl:50
          ...

        Stacktrace:
          [1] convert(#unused#::Type{Float64}, x::ForwardDiff.Dual{ForwardDiff.Tag{Turing.TuringTag, Float64}, Float64, 2})
            @ Base ./number.jl:7
          [2] setindex!(A::Vector{Float64}, x::ForwardDiff.Dual{ForwardDiff.Tag{Turing.TuringTag, Float64}, Float64, 2}, i1::Int64)
            @ Base ./array.jl:969
          [3] _rand!
            @ ~/.julia/packages/Distributions/wyOWr/src/univariates.jl:147 [inlined]
          [4] rand!
            @ ~/.julia/packages/Distributions/wyOWr/src/univariates.jl:142 [inlined]
          [5] rand
            @ ~/.julia/packages/Distributions/wyOWr/src/genericrand.jl:52 [inlined]
          [6] rand
            @ ~/.julia/packages/Distributions/wyOWr/src/genericrand.jl:24 [inlined]
          [7] rand
            @ ~/.julia/packages/Distributions/wyOWr/src/genericrand.jl:22 [inlined]

versions

  Julia 1.9.3
  [31c24e10] Distributions v0.25.102
  [366bfd00] DynamicPPL v0.23.18
  [fce5fe82] Turing v0.29.1
torfjelde commented 11 months ago

This is an unfortunate interaction between ForwardDiff.jl and Distributions.jl :confused:

I'd probably just do something like:

@model function test_model_inference()
    μ ~ Gamma(2, 2)
    σ ~ Gamma(2, 2)
    return μ, σ
end

@model function test_model()
    @submodel μ, σ = test_model_inference()
    return rand(InverseGaussian(μ, σ), 1)
end

# Use `test_model_inference` for, well, inference.
chain = sample(test_model_inference(), NUTS(1000, .85), MCMCThreads(), 1000, 1)

# Use `test_model` for everything else.
generated_quantities(test_model(), chain)
itsdfish commented 11 months ago

Thanks for the suggestion. I came up with my own (see below).

My goal was to develop a general function for my package which generates a predictive quantity:

function my_function(dist, n, parms...)
       sim_data = rand(dist(parms...), n)
       # some cool stuff here
end

From my perspective, generated_quantities is not as useful if it cannot process arbitrary functions. There are two workarounds I can think of. One is the two step process that you proposed. This adds extra steps and makes this specific function different from others, and complicates things for users. Another is to use map(rand)) or a for loop instead of rand(dist(parms...), n), which is a somewhat annoying exception on the developer side, but makes the user experience better.

I think it would be a good idea to fix this issue if feasible. I'd be glad to help if I can.

torfjelde commented 11 months ago

From my perspective, generated_quantities is not as useful if it cannot process arbitrary functions.

I'm confused by this; generated_quantities supports arbitrary Julia code.

torfjelde commented 11 months ago

I think it would be a good idea to fix this issue if feasible. I'd be glad to help if I can.

It's quite inconvenient to fix tbh. It's an issue that shows up in the interaction with Distributions.jl and ForwardDiff.jl. We could fix it in Turing.jl but this would be type-piracy. And fixing this in Distribution.jl is somewhat non-trivial because ForwardDiff is the one that is being "weird" in this scenario (if my hunch is correct).

Another thing you can do, is to check whether you're running inference or not:

# Generic.
isinference(ctx::DynamicPPL.AbstractContext) = isinference(DynamicPPL.NodeTrait(ctx), ctx)
isinference(::DynamicPPL.IsParent, ctx::DynamicPPL.AbstractContext) = isinference(DynamicPPL.childcontext(ctx))
isinference(::DynamicPPL.IsLeaf, ctx::DynamicPPL.AbstractContext) = false
# Special cases: assume everything that isn't using `SampleFromPrior` to be performing inference.
isinference(ctx::DynamicPPL.SamplingContext) = ctx.sampler isa DynamicPPL.SampleFromPrior

@model function test_model()
    μ ~ Gamma(2, 2)
    σ ~ Gamma(2, 2)

    if isinference(__context__)
        return nothing
    else
        return rand(InverseGaussian(μ, σ))
    end
end

If these things garner enough interest, we can easily add such functionality to DynamicPPL.jl (and thus Turing.jl).

Related issue: https://github.com/TuringLang/DynamicPPL.jl/issues/510

torfjelde commented 11 months ago

But I will point out that the usage of @submodel I demonstrated above is also just good practice in general: put the stuff you want to run during inference inside one model, and then the stuff you don't need to run during inference but you're only interested in for post-processing you put in a parent model. This way you avoid unnecessary computations when you're performing inference; in your example, you're unnecessarily calling rand when you're doing inference.

itsdfish commented 11 months ago

@torfjelde, thank you for your explanation. The case you made for using generated quantities with a submodel is compelling. I'll give that a try and if it works out, I will use it as a general approach/recommendation.

I wonder if it would be worth adding another example in the docs explaining the benefits of using submodels. If so, I can submit a PR to that effect.

Thanks again!