brian-j-smith / Mamba.jl

Markov chain Monte Carlo (MCMC) for Bayesian analysis in julia
Other
253 stars 52 forks source link

Documentation on handling missing values #96

Closed kkmann closed 5 years ago

kkmann commented 8 years ago

Hey,

I am a little bit lost in the way missing data is handled in Mamba. I checked the bones example, but it does not offer much explanation. The documentation on the MISS sampler did not clarify the issue for me either. Say I have missing data both in the covariates and the dependent variable in a simple regression problem, can I still use the MISS sampler? Why cannot I use NUTS in this case? So, in JAGS for example there is no special treatment of missing values required.

PS: Are you planning to support DataFrames with missing values as input / inits in the long run?

Best,

Kevin

brian-j-smith commented 8 years ago

Hi @kkmann : Right now, MISS can be used for missing values in (terminal) stochastic nodes, such as the dependent variable in a regression model; but not on non-stochastic or non-terminal nodes. MISS samples directly from the associated full conditional distribution, which is more efficient than any of the other sampling routines, including NUTS. So, the extra time and effort needed to get all samplers to support missing values would not really improve performance.

Note that the handling of missing values is transparent/automated in JAGS because an expert system is used by that software to pick the samplers for nodes.

That's a good question about support for DataFrames. I thought about it when doing the initial implementation, but went with the current approach because the coding was simpler. It's still on my radar though.

kkmann commented 8 years ago

Ah, too bad. Are you planning on extending the functionality in this direction? I understand that your primary focus is not data analysis but MCMC research itself but I must say that I really appreciated the Mamba interface and the flexibility of having the full power of julia at my fingertips (other than with JAGS). However, convenient hadling of missing data is a must in my application and I am not expert enough to implement the samplers myself (actually the extend of missingness pushed me towards Bayesian methods in the first place).

Anyway, thanks for the amazing work so far!

brian-j-smith commented 8 years ago

I certainly appreciate and take positive feedback like this into account when planning future extensions. I'll keep the missing value issue in mind.