gergness / srvyr

R package to add 'dplyr'-like Syntax for Summary Statistics of Survey Data
209 stars 28 forks source link

Variance-covariance matrix for a multivariate estimate #75

Open krivit opened 4 years ago

krivit commented 4 years ago

survey::svymean() results an object of class svystat, which has a vcov() method to obtain not only the variances of the estimates but also their covariances. Is there a way to do that in srvyr?

As far as I can tell, the nearest replacement, srvyr::survey_mean() in summarize(), can take a matrix as its first argument, but it returns a tbl_df and does not provide a way to obtain covariances. Is there some other way to do that?

krivit commented 4 years ago

There is a mention of the difference in the vignette (https://cran.r-project.org/web/packages/srvyr/vignettes/srvyr-vs-survey.html), but does that mean that the only way to obtain covariances is to go back to survey?

gergness commented 4 years ago

Yes, that is currently true, leaving this open because I can imagine revisiting when I get my head around the advanced dplyr 1.0 features about returning multiple values, but this is a tricky issue.

gergness commented 3 years ago

Struggling with what the return would look like. I can imagine a wrapper function that made this easier, but I don't think it's the right return. Something about covariance matrices just doesn't seem to fit within a data.frame to me.

library(srvyr)
data(api, package = "survey")
dstrata <- apistrat %>% as_survey(strata = stype, weights = pw)

dstrata %>%
  summarize(
    api99_mn = survey_mean(api99),
    api00_mn = survey_mean(api00),
    api_cov = list(vcov(survey::svymean(cur_svy()$variables[, c("api99", "api00")], cur_svy())))
  )
#>   api99_mn api99_mn_se api00_mn api00_mn_se
#> 1 629.3948    10.09699 662.2874    9.536132
#>                                   api_cov
#> 1 101.94914, 94.28401, 94.28401, 90.93782
krivit commented 3 years ago

The way to prevent data.frame() from mangling lists of matrices is to enclose them in I(). For example,

data.frame(mat = I(list(diag(3), diag(2))))
#>            mat
#> 1 1, 0, 0,....
#> 2   1, 0, 0, 1

Alternatively, tibble doesn't mangle by default:

library(tibble)
tibble(mat = list(diag(3),diag(2)))
#> # A tibble: 2 x 1
#>   mat              
#>   <list>           
#> 1 <dbl[,3] [3 × 3]>
#> 2 <dbl[,2] [2 × 2]>

Is this what you are looking or?

gergness commented 3 years ago

No, but like, isn't it weird to have a matrix stuffed in a column like that? There's nothing attaching the rows/columns to the data.frame

krivit commented 3 years ago

Doesn't strike me as particularly weird. It was always possible, if a bit awkward, to put complex objects, including other data frames, into cells of a data frame.

skolenik commented 2 years ago

Can you place it in the attr() of that tibble/data frame?

gergness commented 2 years ago

Theoretically yes, but without some design they’d be hard to access and not behave as you expect.

Do you have code samples of what you currently do with the survey package and/or what you wish srvyr did?