Note that our goal is to get at least one PSU from each stratum in every training set (not in every fold), and this will happen as long as no stratum ends up with all its PSUs in the same fold. [...] at least two PSUs per stratum are needed, each of them classified in a different fold. This is an advantage over the method proposed by Wieczorek et al. (2022), which requires at least
K PSUs per stratum.
Finally, also improve our documentation & error messages, so that it's clearer to our package users why things are failing when you have too few PSUs per stratum (and give suggestions of what can be done about it).
"Variable selection with LASSO regression for complex survey data,"
Iparragirre et al. (2023), Stat
https://onlinelibrary.wiley.com/doi/full/10.1002/sta4.578
Review and incorporate the method they call
dCV
:Finally, also improve our documentation & error messages, so that it's clearer to our package users why things are failing when you have too few PSUs per stratum (and give suggestions of what can be done about it).