bschneidr / fastsurvey

A fork of the `survey` R package, using {Rcpp}
8 stars 0 forks source link

Throwing singleton error for second stage in a within-replacement design #7

Closed bschneidr closed 1 year ago

bschneidr commented 1 year ago

When building the "epi.Rnw" vignette, Thomas noticed the following issue:

First, it throws an error in chunk 9 of vignettes/epi.Rnw

(dBarlow<-svydesign(id=~seqno+eventrec, strata=~in.subcohort+rel,
                    data=nwtco.expd, weight=~pwts))
svycoxph(Surv(start,stop,event)~factor(stage)+factor(histol)+I(age/12),
                design=dBarlow)

gives:

Error: processing vignette 'epi.Rnw' failed with diagnostics:
 chunk 9
Error in eval(expr, .GlobalEnv) :
  At least one stratum contains only one PSU at stage 2

This isn't necessary because the design has sampling with replacement and the later stages don't matter. I'll look at why the C++ code and R code are different.

I've looked into this, and it's because the C++ code is doing unnecessary variance computations (correct, but unnecessary), in the case of multistage samples where at least one of the earlier stages is without-replacement. Whenever a stage is without replacement, the subsequent stages don't matter, and so the C++ code shouldn't do any variance computations. But right now it is.

bschneidr commented 1 year ago

Addressed by the following update to the C++ code:

cf7a179

This means that if a given stratum has with-replacement sampling, then no variance calculations will be done for later sampling stages within that stratum.