Open krshedd opened 6 years ago
Hi Kyle!
That's cool to hear that the PB method improves things in other real scenarios.
The correction in the PB method involves adding or subtraction an amount to each of the collection mixing proportions. So, rather than multiplying you would want to add or subtract the amount back into the mixing proportion traces.
For example, let's say the estimated mixing proportion of collection i is p_i (without any parametric bootstrap). And let's say that with parametric bootstrap correction the estimated mixing proportion is p*_i.
Find the difference between those:
d_i = p*_i - p_i
Now, d_i is the amount that was added to p_i to try to make it less biased. So, if you had a trace of uncorrected values visited during the course of the chain, you could add d_i to each of those and use that as an approximate sample from a corrected posterior. Of course, you might get values <0 or greater than 1, in which case you would want to squash them to be 0 or 1.
This is sort of a hack, but given the way we tried to reduce the bias by a simple translation of the estimated values, it seems the most appropriate.
Cheers,
eric
I should probably add a function at some point that does the process described above.
I will call that an enhancement.
Hey Eric, that is what we thought, but just wanted to make sure we understood the method correctly. It should be pretty straightforward for us to do what you described on our end (solve for d_i and add to each trace with hard limits at 0 and 1). Thanks for the quick response!
Actually, upon further thought, the bootstrapped_proportions
tibble doesn't have output by collection (i), it is already summed up to reporting units. I suppose we'd need the the output to provide estimates of p*_i for the method you described above.
Oops! My bad.
That is right Kyle. So, what you will need to do is imagine that p_i, p*_i, and d_i refer to quantities related to the i-th reporting unit, rather than collection, and do the same thing, but with reporting units rather than collections.
Hey Eric, at the very end of bootstrap_rho.R
you end up having to normalize rho_pb
after squashing values <0 to 0 in order for stock comps to sum to 1. If we want to extract CIs from the parametric bootstrap corrected values with the method that you described above (i.e. apply d_i
to each sweep of the trace to get a "bias corrected trace"), would we compute d_i
before or after normalization? In other words, does:
1) d_i = rho_mean - rho_est
(pre-normalization, first line below) or
2) d_i = rho_est - rho_pb
(post-normalization, an additional line at the end)?
I could be wrong, but it seems like we'd end up with different "bias corrected" results depending on both what level we normalize (pi
vs. rho
) and when we normalize. Here are the lines in question from bootstrap_rho.R
rho_pb <- rho_est - (rho_mean - rho_est)
rho_pb[rho_pb < 0] <- 0
rho_pb <- rho_pb/sum(rho_pb)
rho_pb
Hope that makes sense. Thanks in advance!
Hey Eric, we've been doing some more testing with a Cook Inlet Chinook SNP baseline and found that the parametric bootstrap method was able to substantially improve reporting group performance on known mixtures. However, if we were to use that feature for reporting it would be nice to show CIs or some other measure of variance around our estimates. Do you know of a good way to provide an estimate of variance from
infer_mixture
whenmethod = "PB"
? Would it be appropriate to multiply the corrected estimates of pi by the traces in order to get CIs? Apologies if we are off base. Thanks in advance.