gergness / srvyr

R package to add 'dplyr'-like Syntax for Summary Statistics of Survey Data
209 stars 27 forks source link

`as_survey_design()` silently drops groups #178

Closed rossellhayes closed 1 month ago

rossellhayes commented 1 month ago

Calling as_survey_design() on data that has been grouped with group_by() silently drops the groups. This can lead to unexpected results. For example, in the following reprex one might expect the two results to be equal, but only one row is returned in the second example because group_by() is called before as_survey_design().

library(srvyr)
library(survey)

data(api, package = "survey")

apistrat %>%
  as_survey_design(strata = stype, weights = pw) %>%
  mutate(api_diff = api00 - api99) %>% 
  group_by(stype) %>%
  summarise(api_diff = survey_mean(api_diff, vartype = "ci"))
#> # A tibble: 3 × 4
#>   stype api_diff api_diff_low api_diff_upp
#>   <fct>    <dbl>        <dbl>        <dbl>
#> 1 E        38.6         33.1          44.0
#> 2 H         8.46         1.74         15.2
#> 3 M        26.4         20.4          32.4

apistrat %>%
  group_by(stype) %>%
  as_survey_design(strata = stype, weights = pw) %>%
  mutate(api_diff = api00 - api99) %>% 
  summarise(api_diff = survey_mean(api_diff, vartype = "ci"))
#> # A tibble: 1 × 3
#>   api_diff api_diff_low api_diff_upp
#>      <dbl>        <dbl>        <dbl>
#> 1     32.9         28.8         37.0

Created on 2024-08-14 with reprex v2.1.0

It would be great if either

  1. as_survey_design() gave a warning if called with grouped data, or
  2. as_survey_design() had an as_survey_design.grouped_df() method that preserved grouping.

I think preserving grouping would be the most user friendly, but giving a warning could also work if preserving grouping isn't feasible. I can work on a PR to implement either solution if it would be helpful!

gergness commented 1 month ago

Caught me at a good time should be on cran soon

rossellhayes commented 1 month ago

Fantastic, I definitely got lucky with that timing. Thanks!