Closed tharkanen closed 4 months ago
Hi @tharkanen, you're in good company: this is a recurring point of confusion for users (#161).
The answer though is that for a grouping variable used in group_by()
, if you want to remove missing values then you have to filter those out first before calling group_by()
or summarize()
. The version of the package which is on GitHub has some documentation which explains this, though the CRAN version doesn't yet have that documentation.
Here's what that documentation looks like: https://github.com/gergness/srvyr/pull/161/files
Thanks a lot @bschneidr! These instructions give the same results without the NA category:
> strat_design_srvyr |> summarize(m=survey_mean(awards=="No", na.rm=TRUE))
# A tibble: 1 × 2
m m_se
<dbl> <dbl>
1 0.368 0.0365
> strat_design_srvyr |> summarize(m=survey_mean(awards=="Yes", na.rm=TRUE))
# A tibble: 1 × 2
m m_se
<dbl> <dbl>
1 0.632 0.0365
> strat_design_srvyr |> filter(!is.na(awards)) |> group_by(awards) |> summarize(m= survey_mean(na.rm=TRUE))
# A tibble: 2 × 3
awards m m_se
<fct> <dbl> <dbl>
1 No 0.368 0.0365
2 Yes 0.632 0.0365
I created some missing values in the example data:
I expected that na.rm=TRUE would drop the NA from the output, which I need. So No+Yes would be 100%, not 90.5%. Is there some solution to this?