brian-j-smith / Mamba.jl

Markov chain Monte Carlo (MCMC) for Bayesian analysis in julia
Other
253 stars 52 forks source link

NUTSVariate() for distinct blocks sampling #169

Closed jojal5 closed 4 years ago

jojal5 commented 4 years ago

Hi,

I rewrote the Line: Block-Specific Sampling with AMWG and Slice () for illustrating the problem.

In this example, if the AMWGVariate command is replaced with the NUTSVariate, an error is thrown stating that the variable of another block is not defined. To work around the problem, it is possible to first define the sampler with AMWGVariate and after, redefine it with NUTSVariate as in the code below. Is it normal? Thank you for the excellent package.

################################################################################
## Linear Regression
##   y ~ N(β₀ + β₁ * x, σ²)
##   b0, b1 ~ N(0, 1000)
##   s2 ~ invgamma(0.001, 0.001)
################################################################################

using Mamba, ForwardDiff, Distributions

## Data
data = Dict{Symbol, Any}(
  :x => [1, 2, 3, 4, 5],
  :y => [1, 3, 3, 3, 5]
)

## Log-transformed unnormalized joint posterior for b0, b1, and log(s2)
function logf(θ::Vector{<:Real})
   β₀ = θ[1]
   β₁ = θ[2]
   ϕ = θ[3] # ϕ = log(σ²)

    y = data[:y]

   μ =  β₀ .+ β₁*data[:x]
   σ = sqrt(exp(ϕ))

   pd = Normal.(μ, σ)

   ll = sum(logpdf.(pd,y)) + logpdf(Normal(0,1000), β₀) + 
    logpdf(Normal(0,1000), β₁) + logpdf(Gamma(.001, .001),exp(ϕ))

    return ll

end

f1(β) = logf([β; ϕ])
Δf1(β) = ForwardDiff.gradient(f1, β)

function logfgrad1(β::Vector{<:Real})
    ll = f1(β)
    g = Δf1(β)

    return ll, g
end

f2(ϕ) = logf([β; ϕ])
Δf2(ϕ) = ForwardDiff.gradient(f2, ϕ)

function logfgrad2(ϕ::Vector{<:Real})
    ll = f2(ϕ)
    g = Δf2(ϕ)

    return ll, g
end

## MCMC simulation
n = 10000
burnin = 1000
sim = Chains(n, 3, names = ["β₀", "β₁", "σ²"])

######################################
# Those two lines must be added for NUTSVariate to work
# β = AMWGVariate([0.0, 0.0], 1.0, logf) 
# ϕ = AMWGVariate([0.0], 1.0, logf)
######################################
β = NUTSVariate([0.0, 0.0], logfgrad1)
ϕ= NUTSVariate([.1], logfgrad2)
for i in 1:n
  sample!(β, adapt = (i <= burnin))
  sample!(ϕ)
  sim[i, :, 1] = [β; exp.(ϕ)]
end
describe(sim)
brian-j-smith commented 4 years ago

Cool implementation.

An initial value for ϕ is needed in this case since it appears in logfgrad1, which NUTSVariate() calls to get an initial value for one of the tuning parameters (step size). Something like the following should do.

ϕ = [.1]
β = NUTSVariate([0.0, 0.0], logfgrad1)
ϕ = NUTSVariate([.1], logfgrad2)

You'll probably also want to use an inverse-Gamma prior for the variance in the log-posterior.

logpdf(InverseGamma(.001, .001), exp(ϕ))

Now, I'm noticing that the NUTS sampler output for the variance is lower than that from the Gibbs sampler in the package tutorial (posterior mean 0.55 vs 1.22). The beta output is comparable. If the prior variance is dropped from your log-posterior, then the NUTS variance output is more comparable, which makes me think the prior is having a different effect in the two implementations.

jojal5 commented 4 years ago

Thank you for the answer. It resolves the issue.

And yes, I meant the Inverse-gamma distribution :)