Closed sdwfrost closed 10 years ago
Thanks for opening the issue @sdwfrost, I will look into it within the next few days and will get back to you.
Also, DataArrays is not automatically installed by running Pkg.add("MCMC")
I just added MCMC to the REQUIRE file. I don't understand the codebase well enough to make other fixes.
John, I have been debating with myself about this; what is better practice, should I be using data arrays or more generic float64 arrays in various places as function input arguments in the package?
Do you want to support missing data in inputs? If so, I'd use DataArrays. You can always convert them to Array{Float64}
behind the scenes using array(data, NaN)
to substitute NaN
for NA
.
I always consider NA's evil when working with MCMC... I hope not because I am prejudiced, but mostly because of the statistical dependence between successive points in a Monte Carlo chain, meaning that an NA occurrence ruins all the chain. From that point of view it may be better to stick with float64 arrays. If someone in the Monte Carlo or statistical community objects my thinking, please speak up.
I'm confused: in most MCMC systems like Jags that I've used in the past, the system will impute the values of missing entries based on their local density function. Then the system will continually resample the imputations based on all of the other nodes in the graphical model.
Yes, imputation is a possibility, yet it is sth that we haven't implemented in our MCMC package. Your suggestion makes sense to me. Would you agree that it would be more reasonable not to allow for NA's in MCMC chains until we support imputation of NA values? I have never tried to implement the idea in MCMC context (only in statistical genetics but this is a completely different area), so I don't have the knowledge to add this functionality to our package at the moment without reading first.
Yes, I agree that, if you're not imputing missing data, then it's reasonable to not allow it as input.
Hi folks,
FWIW- If users provide their own likelihood function, then they may be able to deal with NA's on their own, even if the MCMC package's model doesn't know how yet. Further, it seems like something that'll be added in the future. So my suggestion would be to allow NA's for input data, but that MCMC outputs should not have NA's.
Cheers, Eric
On Sun, Jan 12, 2014 at 7:04 PM, John Myles White notifications@github.comwrote:
Yes, I agree that, if you're not imputing missing data, then it's reasonable to not allow it as input.
— Reply to this email directly or view it on GitHubhttps://github.com/JuliaStats/MCMC.jl/issues/28#issuecomment-32128691 .
Eric Ford Professor of Astronomy & Astrophysics Center for Exoplanets & Habitable Worlds Center for Astrostatistics Institute for CyberScience Pennsylvania State University
The complication with letting users handle NA's is that the package probably wants to insist that all computations are done using Array{Float64}
rather than DataArray{Float64}
. For example, the Optim package requires that the user-specified objective function be a function of the form f: Vector{Float64} -> Float64
. So NA's need to be handled with some caution when allowed as inputs.
Since it has been agreed not to allow NA's in MCMC
(at least not at this stage), I will close this old ticket for housekeeping purposes.
DataArray isn't defined in SerialMC.jl ; a fix is to explicitly import it
and/or to add DataArrays to REQUIRE