Closed itsdfish closed 3 years ago
I just checked, and the example is removed from the README in https://github.com/TuringLang/MCMCChains.jl/pull/265 :stuck_out_tongue: Apart from that, I have never worked with dic
and am not familiar with the implementation...
There are two common ways of computing DIC. See: https://en.wikipedia.org/wiki/Deviance_information_criterion
The metric, commonly used in Bayesian contexts, attempts to provide a metric of goodness of fit while adjusting for over-fitting. As you can imagine, as you increase the number of parameters in your model, the fit will likely improve, though you may be over-fitting in the sense that the extra parameters are not needed to obtain a "good fit".
The Gelman definition of the DIC is:
with:
The deviance (D in DIC) is just -2 * loglikelihood(data | parameters)
.
Dbar is the goodness of fit part. It is essentially (2 times) the negative of the log likelihood. So smaller Dbar is better.
pD is a "penalty" term that penalizes for model complexity. As you can see, it is related to the variance of the log likelihood, which (the variance) is guaranteed to be positive. When your model is too complex (perhaps has more parameters than required to obtain good fit), pD will increase.
DIC = Dbar + pD will be low when both Dbar and pD are low. i.e. good fit, and low model complexity. Models with low DIC are preferred.
If you have some posterior samples for parameters theta, then you can compute Dbar by averaging over D(theta) for each theta. Similarly, pD (same as pV in Gelman definition) is 0.5 times the variance of D evaluated at each posterior sample of theta.
Edit: The DIC as defined by Gelman is probably easier to implement generally. With the Spiegelhalter definition, you may have to worry about things like label switching in a mixture model. (Taking an average naively over the parameters may not make sense.)
Also, in a model comparison setting, the normalizing constant for the log likelihood "cancels out", so is not required.
Say you have (in a vector) evaluations of the log likelihood for each posterior sample, you could implement a function to compute the DIC like this.
import Statistics: mean, var
deviance(loglikelihood::Real) = -2 * loglikelihood
function dic(loglikelihood::AbstractVector{<:Real})
D = deviance.(loglikelihood)
return mean(D) + 0.5 * var(D)
end
@luiarthur, thanks for your help. This looks feasible. Let me look into this during the weekend when I have more bandwidth. I'll either submit a PR or tell you that I am unable to make it work.
Chris,
The deviance part is also computed for WAIC and PSIS in ParetoSmoothedImportanceSampling.jl.
Rob
Thanks, Rob. I wonder whether it makes more sense to refer users to your PSS package? If so, would you be willing export DIC? I could make those changes easily.
Absolutely! Maybe for historical reasons I should also add AIC. But both DIC and AIC are mentioned in SR but dropped in favor of WAIC for non (Mv)Normal cases.
Sounds like a plan. In that case, I will close this issue. I should have time in the next day or two to submit a PR.
Note that this PR is also discussed at https://discourse.julialang.org/t/bayesian-model-selection/55725/5.
I want to report that the DIC example is not working. Here is the framework that is presented in the documentation:
There are several errors with the dimensions. I'm not sure how DIC is calculated, but it seems odd that the loop of the following function iterates over parameters. The logpdf should take an entire vector, not just one parameter.
In addition, I want to suggest changing the API as follows:
DIC, pD = dic(chn, distribution_object, the_data)
The chain, distribution object and data would get passed to a corrected version of the function above. This would make it easier for the user to use.
I tried fixing these issues, but I have no idea how DIC is calculated. If you have a quick fix for
lpfun
, I can put the rest together.