gergness / srvyr

R package to add 'dplyr'-like Syntax for Summary Statistics of Survey Data
214 stars 28 forks source link

Need a way to respect .drop when using interact #134

Open gergness opened 3 years ago

gergness commented 3 years ago

If you're grouping by an interact variable, it's not possible to avoid dropping missing levels.

library(srvyr)
data(api, package = "survey")
dstrata <- apistrat %>%
  as_survey_design(strata = stype, weights = pw)

dstrata %>% 
  filter(stype == "E") %>% 
  group_by(stype, .drop = FALSE) %>%
  summarize(n = survey_total())
#> # A tibble: 3 × 3
#>   stype     n  n_se
#>   <fct> <dbl> <dbl>
#> 1 E     4421.     0
#> 2 H        0      0
#> 3 M        0      0

dstrata %>% 
  filter(stype == "E") %>% 
  group_by(interact(stype), .drop = FALSE) %>%
  summarize(n = survey_total())
#> # A tibble: 1 × 3
#>   stype     n  n_se
#>   <fct> <dbl> <dbl>
#> 1 E     4421.     0

Created on 2021-11-07 by the reprex package (v2.0.1)

gergness commented 2 years ago

Initial attempts at this have failed because dplyr:::compute_groups (called from dplyr::group_by) recreates the factor vector without considering additional metadata.

Might be worth asking tidyverse team if it could preserve metadata here (I don't totally understand this code, seems like there should be a cleaner way that preserves metadata, which would help with the factor vs ordered distinction) https://github.com/tidyverse/dplyr/blob/86a8455fed6b763927be06a0fbe685444442bc9f/R/grouped-df.r#L78