bschneidr / fastsurvey

A fork of the `survey` R package, using {Rcpp}
7 stars 0 forks source link

`survey.lonely.psu = 'adjust'` mismatch between base and C++, for multistage sample #9

Closed bschneidr closed 1 year ago

bschneidr commented 1 year ago

From Thomas:

Also, the comparison with the R code in tests/multistage_rcpp fails in one place. I've got your fix for survey.lonely.psu="adjust", but that seems to only fix the one-stage case, not the multistage case.

    survey.lonely.psu one.stage results_match
  1         certainty      TRUE          TRUE
  2            remove      TRUE          TRUE
  3           average      TRUE          TRUE
  4            adjust      TRUE          TRUE
  5         certainty     FALSE          TRUE
  6            remove     FALSE          TRUE
  7           average     FALSE          TRUE
  8            adjust     FALSE         FALSE
bschneidr commented 1 year ago

It turns out this was an issue with the new R code added into onestage(), which only occurred if the stratification variable (i.e., strata, inside onestage()) was encoded as a factor. This is fixed by coercing strata to a numeric (non-factor) vector.

9c9abd2

The issue is that tapply() applied the function head() to every possible value of strata based on the list of factor levels, and not just to the actually observed strata in a given subset of data. So this resulted in NA values being generated for the strata that were not actually observed, and so sum(...) resulted in a value of NA, which led to the entire result of multistage() being NA.

With this commit, we now have expected results:

    survey.lonely.psu one.stage results_match
  1         certainty      TRUE          TRUE
  2            remove      TRUE          TRUE
  3           average      TRUE          TRUE
  4            adjust      TRUE          TRUE
  5         certainty     FALSE          TRUE
  6            remove     FALSE          TRUE
  7           average     FALSE          TRUE
  8            adjust     FALSE          TRUE