gergness / srvyr

R package to add 'dplyr'-like Syntax for Summary Statistics of Survey Data
214 stars 28 forks source link

Add degf option to `as_survey_rep()` #171

Closed bschneidr closed 4 months ago

bschneidr commented 8 months ago

This PR brings as_survey_rep() up-to-date with an update to svrepdesign() in the latest release of the 'survey' package.

https://cran.r-project.org/web/packages/survey/NEWS

allow user to specify degf= in svrepdesign to avoid needing to compute it (for Ben Schneider)

The updates include:

  1. Adding degf as an argument in as_survey_rep(), behind mse, with accompanying roxygen2 documentation
  2. Adding a small unit test
  3. Updating the NEWS.md file
  4. Adding a couple sentences to the databases vignette, since this update is useful for large data sets, which is the main motivation for using a database-backed survey, I think.

Background

The motivation behind this change in the 'survey' package is that when Thomas Lumley updated degf() in version 3.3 of 'survey', it was a good idea from a statistical perspective but introduced problems when working with large datasets, since it requires an expensive matrix decomposition of the replicate weights. So when working with large datasets like ACS microdata, the function svrepdesign() could crash or take up a lot of time and memory just to calculate the design degrees of freedom. This update to 'survey' and 'srvyr' allows the user to bypass this by manually specifying the degrees of freedom.

bschneidr commented 8 months ago

I think the checks are failing because it's installing an older version of 'survey': the updated 'survey' only just started releasing on CRAN yesterday. So maybe we should give it a couple days before re-running the checks.

gergness commented 4 months ago

Awesome, thanks!