gergness / srvyr

R package to add 'dplyr'-like Syntax for Summary Statistics of Survey Data
208 stars 28 forks source link

Joining separate `as_survey` objects together #148

Open themichjam opened 1 year ago

themichjam commented 1 year ago

I have complex survey data that has phase 1 and phase 2 weights. Is it possible to combine separate as_survey design objects together?

They all exist within the same survey wave, same amount of rows/obs etc, however some variables have their own specific weighting variable (e.g. "condition_2" has its own weighting variable "wt2_var").

Small example below (apologies it's not a reprex, I couldn't find an open df that had this issue):

# phase 1 weighting
weighted <- df1 %>%
  as_survey(
    strata = s_var,
    id = id_var, nest = TRUE,
    weights = wt1_var
  )

# subset specific variables needing weighted
df1_sub <- df1 %>% select("condition_2", "s_var", "id_var", wt2_var")

# phase 2 weighting specific variables within df1
weighted_2 <- df1_sub %>%
  as_survey(
    strata = s_var
    id = id_var, nest = TRUE,
    weights = wt2_var
  )
gergness commented 1 year ago

~I must confess I’ve never actually used it, but I think you should join them together as data.frames and then use http://gdfe.co/srvyr/reference/as_survey_twophase.html~

sorry reading comprehension is bad at the end of my day, interesting idea not currently possible

themichjam commented 1 year ago

Is there a way to weight that one specific variable with its different weighting var in the same as_survey object as the phase 1 object? Or not possible either?

bschneidr commented 1 year ago

Could you share a little about why there are different weights? Is it because one is a base weight and the other is raked/poststratified/nonresponse-adjusted?

In general, switching weights used for different variables isn't something covered by survey or srvyr. I'm not sure how common it is. The best example I can think of where a big survey does this is NHANES, where some variables use specialized weights. In these cases, normally folks just create multiple survey design objects. I wonder if it could be worth adding a helper function like set_weights() (along the lines of set_geometry() from the sf package) to allow the user to change the weights variable to something else in the data. It would get tricky for designs with raking/post-stratification and should probably not be allowed to work for replicate designs, but could maybe make sense for a basic linearization design.

bschneidr commented 1 year ago

If you have replicate weights, you can "stack" replicate designs with the 'svrep' package and compare estimates from the different designs.

https://bschneidr.github.io/svrep/reference/stack_replicate_designs.html https://bschneidr.github.io/svrep/reference/svyby_repwts.html

That's useful for example if you want to compare estimates from different sets of weights. But this will only work correctly if you are given multiple sets of replicate weights from the data provider, or if you have the base weights and other design information needed to correctly create replicate weights.

themichjam commented 1 year ago

Yes, its much like NHANES were some variables dealing with certain physical/mental health conditions use specialised weights to account for non-response etc.

The svrep stacking idea looks interesting!