Closed bschneidr closed 1 year ago
Related: It seems the fpc is not applied if weights
is supplied.
data(api, package = "survey")
apiclus1$fpc <- 18
dclus1 <- svydesign(data = apiclus1,
id = ~dnum, weights = ~pw, fpc=~fpc)
# Create replicate-weights survey design
orig_rep_design <- as_bootstrap_design(dclus1, replicates = 5000,
type = "Rao-Wu-Yue-Beaumont")
reps <- withReplicates(orig_rep_design, function(w, data) {
weighted.mean(data$api99, w)
}, return.replicates=T)
r <- reps$replicates
print(quantile(r, 0.025))
print(quantile(r, 0.975))
confint(svymean(~api99, orig_rep_design), df=degf(orig_rep_design))
boots <- as.svrepdesign(dclus1, type="bootstrap",replicates=5000)
reps <- withReplicates(boots, function(w, data) {
weighted.mean(data$api99, w)
}, return.replicates=T)
r <- reps$replicates
print(quantile(r, 0.025))
print(quantile(r, 0.975))
confint(svymean(~api99, boots), df=degf(boots))
> print(quantile(r, 0.025))
2.5%
560.2901
> print(quantile(r, 0.975))
97.5%
651.425
> confint(svymean(~api99, orig_rep_design), df=degf(orig_rep_design))
2.5 % 97.5 %
api99 556.2206 657.7357
>
> boots <- as.svrepdesign(dclus1, type="bootstrap",replicates=5000)
> reps <- withReplicates(boots, function(w, data) {
+ weighted.mean(data$api99, w)
+ }, return.replicates=T)
>
> r <- reps$replicates
> print(quantile(r, 0.025))
2.5%
599.1006
> print(quantile(r, 0.975))
97.5%
637.5235
> confint(svymean(~api99, boots), df=degf(boots))
2.5 % 97.5 %
api99 584.6227 629.3336
Hi @petulla,
Thanks for providing the reproducible example.
After reviewing this report, I don't think there's an issue with the 'svrep' code. Rather, I think the input data in this example is simply invalid.
In the example provided, the weights are inconsistent with the fpc
value supplied. The weights used for constructing a bootstrap replicate should be the base sampling weights: that is, the weight should be the inverse of the inclusion probability. In the apiclus1
dataset, this should be 757/15
, which is the number of population clusters divided by the number of sample clusters. However, Dr. Lumley notes in the documentation for the 'survey' package that the weights pw
in the apiclus1
dataset are incorrect (see the "Details" section here.) And then when you set the fpc
to 18
without updating the weights accordingly, that produced another inconsistency between the weights and the finite population corrections.
In other words, the reason you're getting unexpected results is that you've supplied invalid sampling weights or an invalid fpc
, depending on which input one might arbitrarily decide is the definitive input to use for determining sampling probabilities. I think it makes sense to treat the sampling weights as the definitive input for determining sampling probabilities, since that works for both simple random sampling and unequal probability sampling, and that's what weights are supposed to represent.
Does that make sense and resolve things?
Indeed, makes sense! Thanks for the reply. I somewhere along the way got confused on the psu population count. As you mentioned, Lumely explains that they are incorrectly labeled in the dataset (and so are the weights, accordingly).
Running with appropriate values gives good results.
data(api, package = "survey")
apiclus1$fpc <- 757
apiclus1$pw <- 757/15
dclus1 <- svydesign(data = apiclus1,
id = ~dnum, weights = ~pw, fpc=~fpc)
# Create replicate-weights survey design
orig_rep_design <- as_bootstrap_design(dclus1, replicates = 5000,
type = "Rao-Wu-Yue-Beaumont")
reps <- withReplicates(orig_rep_design, function(w, data) {
weighted.mean(data$api99, w)
}, return.replicates=T)
r <- reps$replicates
print(quantile(r, 0.025))
print(quantile(r, 0.975))
confint(svymean(~api99, orig_rep_design), df=degf(orig_rep_design))
boots <- as.svrepdesign(dclus1, type="bootstrap",replicates=5000)
reps <- withReplicates(boots, function(w, data) {
weighted.mean(data$api99, w)
}, return.replicates=T)
r <- reps$replicates
print(quantile(r, 0.025))
print(quantile(r, 0.975))
confint(svymean(~api99, boots), df=degf(boots))
2.5%
559.8802
97.5%
651.7246
2.5 % 97.5 %
api99 555.2295 658.7268
2.5%
562.291
97.5%
648.8981
2.5 % 97.5 %
api99 555.8275 658.1288
Bug description
The bootstrap conversion with
as_bootstrap_design()
fails for survey design objects specified a certain way.Created on 2022-12-13 with reprex v2.0.2
Bug explanation
Apparently the creation of the "allprob" element of a survey design object depends on whether the user uses the argument
weights
when specifying the design. If the user doesn't specifyweights
, then the 'survey' package automatically figures out the weights based on sampling fractions at each stage, and these are then used to constructallprob
. But if the user specifies weights, thenallprob
is simply the inverse of the weights.