TuringLang / MCMCChains.jl

Types and utility functions for summarizing Markov chain Monte Carlo simulations
https://turinglang.org/MCMCChains.jl/
Other
268 stars 28 forks source link

DIC example not working #267

Closed itsdfish closed 3 years ago

itsdfish commented 3 years ago

I want to report that the DIC example is not working. Here is the framework that is presented in the documentation:

chn ... # sampling results
lpfun = function f(chain::Chains) # function to compute the logpdf values
    niter, nparams, nchains = size(chain)
    lp = zeros(niter + nchains) # resulting logpdf values
    for i = 1:nparams
        lp += map(p -> logpdf( ... , x), Array(chain[:,i,:]))
    end
    return lp
end
DIC, pD = dic(chn, lpfun)

There are several errors with the dimensions. I'm not sure how DIC is calculated, but it seems odd that the loop of the following function iterates over parameters. The logpdf should take an entire vector, not just one parameter.

In addition, I want to suggest changing the API as follows:

DIC, pD ​=​ ​dic​(chn, distribution_object, the_data)

The chain, distribution object and data would get passed to a corrected version of the function above. This would make it easier for the user to use.

I tried fixing these issues, but I have no idea how DIC is calculated. If you have a quick fix for lpfun, I can put the rest together.

devmotion commented 3 years ago

I just checked, and the example is removed from the README in https://github.com/TuringLang/MCMCChains.jl/pull/265 :stuck_out_tongue: Apart from that, I have never worked with dic and am not familiar with the implementation...

luiarthur commented 3 years ago

There are two common ways of computing DIC. See: https://en.wikipedia.org/wiki/Deviance_information_criterion

The metric, commonly used in Bayesian contexts, attempts to provide a metric of goodness of fit while adjusting for over-fitting. As you can imagine, as you increase the number of parameters in your model, the fit will likely improve, though you may be over-fitting in the sense that the extra parameters are not needed to obtain a "good fit".

The Gelman definition of the DIC is: image

with: image

The deviance (D in DIC) is just -2 * loglikelihood(data | parameters).

Dbar is the goodness of fit part. It is essentially (2 times) the negative of the log likelihood. So smaller Dbar is better.

pD is a "penalty" term that penalizes for model complexity. As you can see, it is related to the variance of the log likelihood, which (the variance) is guaranteed to be positive. When your model is too complex (perhaps has more parameters than required to obtain good fit), pD will increase.

DIC = Dbar + pD will be low when both Dbar and pD are low. i.e. good fit, and low model complexity. Models with low DIC are preferred.

If you have some posterior samples for parameters theta, then you can compute Dbar by averaging over D(theta) for each theta. Similarly, pD (same as pV in Gelman definition) is 0.5 times the variance of D evaluated at each posterior sample of theta.

Edit: The DIC as defined by Gelman is probably easier to implement generally. With the Spiegelhalter definition, you may have to worry about things like label switching in a mixture model. (Taking an average naively over the parameters may not make sense.)

Also, in a model comparison setting, the normalizing constant for the log likelihood "cancels out", so is not required.

luiarthur commented 3 years ago

Say you have (in a vector) evaluations of the log likelihood for each posterior sample, you could implement a function to compute the DIC like this.

import Statistics: mean, var

deviance(loglikelihood::Real) = -2 * loglikelihood

function dic(loglikelihood::AbstractVector{<:Real})
  D = deviance.(loglikelihood)
  return mean(D) + 0.5 * var(D)
end
itsdfish commented 3 years ago

@luiarthur, thanks for your help. This looks feasible. Let me look into this during the weekend when I have more bandwidth. I'll either submit a PR or tell you that I am unable to make it work.

goedman commented 3 years ago

Chris,

The deviance part is also computed for WAIC and PSIS in ParetoSmoothedImportanceSampling.jl.

Rob

itsdfish commented 3 years ago

Thanks, Rob. I wonder whether it makes more sense to refer users to your PSS package? If so, would you be willing export DIC? I could make those changes easily.

goedman commented 3 years ago

Absolutely! Maybe for historical reasons I should also add AIC. But both DIC and AIC are mentioned in SR but dropped in favor of WAIC for non (Mv)Normal cases.

itsdfish commented 3 years ago

Sounds like a plan. In that case, I will close this issue. I should have time in the next day or two to submit a PR.

rikhuijzer commented 3 years ago

Note that this PR is also discussed at https://discourse.julialang.org/t/bayesian-model-selection/55725/5.