Closed ArtPoon closed 2 years ago
A better approach would be to use non-parametric bootstrap resampling to estimate confidence intervals. I tried a very rudimentary approach here:
> boots <- sapply(1:1000, function(i) {
+ idx <- sample(1:length(y), length(y), replace=T)
+ fit1 <- glm(cbind(y[idx], n[idx]-y[idx]) ~ 1, family='quasibinomial')
+ fit1$coefficients[1]
+ })
> inv.logit <- function(x) {
+ if (x > 100) {
+ return (1)
+ }
+ exp(x) / (1+exp(x))
+ }
> res <- sapply(boots, inv.logit)
which yielded the following histogram:
This yields the following mean and 95% confidence interval:
> summary(res)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000 0.1084 0.9266 0.6319 1.0000 1.0000
> quantile(res, c(0.025, 0.975))
2.5% 97.5%
0 1
Since this seems to be associated with calling BA.1, we should check if our constellation file is out of date and if updating it ameliorates the problem
According to cov-lineages.org, our constellation list is missing 4 substitutions.
We should be able to simply update our submodule clone
Sure
@GopiGugan we should always attempt to update the submodule every time the pipeline is being run
@GopiGugan we should always attempt to update the submodule every time the pipeline is being run
Okay, I'll add that in
At the meeting this week, the users indicated that they want these cases to be reported as "poor sequencing coverage" or "insufficient data to estimate" or something along those lines - rather than our current method of drawing the bar with mean and confidence interval for the bootstrap estimates.
A user reported that the estimated frequency for a variant was
NA
for two samples in their run. We investigated and found that this was being caused by a numerical overflow issue:When we calculate variant frequency as
exp(fit$coef) / (1+exp(fit$coef))
, theexp
overflows. A simple solution is to screen forfit$coef > 100
and assign1
instead of calculating this inverse logit expression.