ColbyStatSvyRsch / surveyCV

R package {surveyCV}: K-fold cross-validation for complex sample survey designs, and associated paper (https://doi.org/10.1002/sta4.454)
7 stars 1 forks source link

Wrap `cv.svy()` around `cv.svydesign()` to let us work directly with database-backed objects #5

Open civilstat opened 2 years ago

civilstat commented 2 years ago

@bschneidr wrote:

I think if the package design was changed so that cv.svy() was a wrapper around cv.svydesign() rather than the other way around, it could more easily handle raking/post-stratification/calibration. But I think that's an issue for another time, as it's not completely clear whether/how calibration should be taken into account, as you mention in the paper.

I believe that swap would also help with database-backed objects:
Right now, thanks to Ben's updates we can take a DBIsvydesign -- but we end up turning it into a dataframe immediately, which might be problematic for large databases.
But if we used subset and transform on the original svydesign object to get training sets etc. (instead of subsetting the dataframe and creating new svydesign objects)... then if users gave us a DBIsvydesign, presumably R would be able to use survey's internal tricks and keep everything in the database as much as possible.

TODO: Test out this idea on a database-backed object. Incorporate into the package if it does indeed work.