ColbyStatSvyRsch / surveyCV

R package {surveyCV}: K-fold cross-validation for complex sample survey designs, and associated paper (https://doi.org/10.1002/sta4.454)
7 stars 1 forks source link

Add the newer design-based CV methods of Iparragirre et al. 2023 #8

Open civilstat opened 8 months ago

civilstat commented 8 months ago

"Variable selection with LASSO regression for complex survey data,"
Iparragirre et al. (2023), Stat
https://onlinelibrary.wiley.com/doi/full/10.1002/sta4.578

Review and incorporate the method they call dCV:

Note that our goal is to get at least one PSU from each stratum in every training set (not in every fold), and this will happen as long as no stratum ends up with all its PSUs in the same fold. [...] at least two PSUs per stratum are needed, each of them classified in a different fold. This is an advantage over the method proposed by Wieczorek et al. (2022), which requires at least K PSUs per stratum.

Finally, also improve our documentation & error messages, so that it's clearer to our package users why things are failing when you have too few PSUs per stratum (and give suggestions of what can be done about it).