Closed bfjarvis closed 5 years ago
Glad that you're finding the package useful!
I assume you're referring to these lines? The code takes N samples from the unit-group combinations, using the frequencies as weights, where N is the individual sample size. I could've expanded the data frame to individual cases, but that's less efficient computationally. So basically it's a bootstrap based on the individual observations. Not sure if that answers your question, but I think that should be right.
I see. I may have just misunderstood the underlying code. I guess when you resample, you return what are essentially individual observations and this bit:
[list(freq = .N), by = vars]
collapses those back down to counts at the group and unit level.
For my part, I've been trying to wrap my head around whether this process makes sense when a particular unit has a count of 0 for a particular group. Bootstrapping (or, equivalently, sampling from a multinomial distribution with group probabilities given by the group composition of the unit) guarantees that the count will be zero again, but that doesn't seem quite right.
Just a thought, but it might run faster if you use rmultinom
for the bootstrapping, if I'm right that bootstrapping is equivalent to drawing from a multinomial distribution.
I see. I may have just misunderstood the underlying code. I guess when you resample, you return what are essentially individual observations and this [bit] collapses those back down to counts at the group and unit level.
Exactly.
For my part, I've been trying to wrap my head around whether this process makes sense when a particular unit has a count of 0 for a particular group. Bootstrapping (or, equivalently, sampling from a multinomial distribution with group probabilities given by the group composition of the unit) guarantees that the count will be zero again, but that doesn't seem quite right.
Yeah, I see what you mean. I'll leave this issue open to think more about that. It seems though, that what you want would only be possible once you impose some model.
And yes, I'll see whether I can make the bootstrapping go a bit faster.
Sampling from a multinomial was a great idea. I've implemented that (e60fb706d8d90d31) and see a four-fold speed increase.
Hi,
Thanks for the package! You've put a lot of work into it!
I'm wondering about the bootstrapping procedure, though. It looks like you are bootstrapping based on samples (with replacement) of group-by-unit observations, but is that the right way to go? Wouldn't it make more sense to take bootstrap samples of individuals within units? Much more computationally intensive, but it seems more theoretically justifiable.