gergness / srvyr

R package to add 'dplyr'-like Syntax for Summary Statistics of Survey Data
209 stars 27 forks source link

Problems with NA values in `survey_prop()` #158

Closed WaceroRuge closed 1 year ago

WaceroRuge commented 1 year ago

Hi,

I recently was trying to get the estimate proportion for discrete variable. I have some NA values in the variable of interest. I added the argument na.rm=TRUE in srvyr::survey_prop() function but it doesn't seem to work. I illustrate the issue.

library(srvyr)
library(survey)

data(api)

dclus1<-svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc)

apiclus1$stype[1:10] <- NA
dclus1<-svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc)
dclus1 <- dclus1 %>% as_survey_design()

dclus1 %>% group_by(stype) %>% summarise(prop = survey_prop())
#> When `proportion` is unspecified, `survey_prop()` now defaults to `proportion = TRUE`.
#> ℹ This should improve confidence interval coverage.
#> This message is displayed once per session.
#> # A tibble: 4 × 3
#>   stype   prop prop_se
#>   <fct>  <dbl>   <dbl>
#> 1 E     0.743   0.0712
#> 2 H     0.0710  0.0267
#> 3 M     0.131   0.0289
#> 4 <NA>  0.0546  0.0559

dclus1 %>% group_by(stype) %>% summarise(prop = survey_prop(na.rm = TRUE))
#> # A tibble: 4 × 3
#>   stype   prop prop_se
#>   <fct>  <dbl>   <dbl>
#> 1 E     0.743   0.0712
#> 2 H     0.0710  0.0267
#> 3 M     0.131   0.0289
#> 4 <NA>  0.0546  0.0559
bschneidr commented 1 year ago

Thanks for the reproducible example. Please see https://github.com/gergness/srvyr/issues/149, which concerns precisely this question and provides a solution.

WaceroRuge commented 1 year ago

Thansk for fast reply!. Indeed, I did the previous filter od missing values before summarise, but I think that this is something confuse.