Incorporating survey weights

fdabl commented 3 years ago

Hi Donny,

I know the population joint distribution of some demographic variables (sex, education, gender) and would like to weight the sample in a way to match this population distribution. The package 'survey' allows this for simple statistics such as the mean as well as for general linear models. I guess that they weight the likelihood under the hood in a way to be consistent with the population distribution.

Do you have any plans of adding such a weighting functionality to BGGM? I think this would be a pretty cool feature! I guess one can do this in a 'hacky' way already by sampling the data according to the population joint distribution and estimating networks for each such sampled data set. My intuition is that this should give the correctly weighted result, in expectation. But the computations take a while, and the post-processing might be a bit annoying. So ideally one would incorporate this weighting in the estimation procedure.

Anyway, curious what you think!

Cheers, Fabian

donaldRwilliams commented 3 years ago

hey !

Nice to hear from you :-)

I honestly have not consider that before, but think it is a great idea.

Can you take a look at the cov.wt function in base R to see if that would work ? If so, that would be rather straightforward to implement.

Other than that, I would need some ideas, papers describing some possible approaches, etc.

fdabl commented 3 years ago

Same, thanks for your quick reply :-)

Yeah, that might indeed just work! This particular weighting business I'm talking about is called post-stratification. The R package 'survey' has some information on this (and many other things). This and this link also provide the basic idea.

Knowing the population distribution one wants to match, one could just provide BGGM with the relevant weights, and if you can incorporate them with e.g. cov.wt, I think that should be it!

donaldRwilliams commented 3 years ago

Cool.

One caveat is that I am not sure how that would work for binary or ordinal data, but I know it is possible (psych package takes weights).

Would implementing the cov.wt for Gaussian data work for you ?

That would be pretty straightforward, and the others would take altering the MCMC samplers.

fdabl commented 3 years ago

I would need it for the copula model. I guess that's difficult? If so I'll give this the more computationally intensive resampling approach I mentioned above a shot for the time being.

donaldRwilliams commented 3 years ago

I have an idea to do it, but want to confirm it is correct.

Seems if weights are known, then can apply to latent data while sampling ?

I'll reach out to Peter Hoff and Joris to get their thoughts about that. If nothing else, I bet they have a solution.

I'll update here when I hear back

fdabl commented 3 years ago

Fantastic, thank you! I think this post-stratification business would be a great addition to BGGM ;-)

donaldRwilliams commented 3 years ago

Agreed !

Email sent :-)

donaldRwilliams commented 3 years ago

well, looks like this is a tough question. The approach I mentioned kind of "feels" right..lol.. But not sure if I would want to implement it without knowing it was correct. Still waiting on one more response, and hopefully a solution will merge !

fdabl commented 3 years ago

Thanks for your efforts, Donny! Any updates on this by any chance?

donaldRwilliams commented 3 years ago

Unfortunately, it seem a bit more involved than I had thought, and it would take some time to work out. Ill try to revisit when I get more time !

fdabl commented 3 years ago

OK, sounds good. Thank you!

donaldRwilliams / BGGM

Incorporating survey weights #76