DASL-Lab / provoc

PROportions of Variants of Concern using counts, coverage, and a variant matrix.
https://dasl-lab.github.io/provoc/
MIT License
0 stars 0 forks source link

Investigate the usefulness of GLM standard errors #16

Open DBecker7 opened 9 months ago

DBecker7 commented 9 months ago

See this StackOverflow question, in particular the answer by atiretoo which computes the SE in R. A more detailed description of these standard errors can be found here. This is for a binomial GLM with no constraints, which of course is not the case for our model.

Steps to investigate:

  1. Simulate a bunch of data and manually estimate the sampling distribution.
    • Do this for a couple different variants - some with a lot of shared mutations (highly multicollinear) and some with only a few.
  2. Calculate the SE from $(X^TWX)^{-1}$ for a single model, compare to manual SE.
  3. Calculate SE from bootstrapping for a single model, compare to manual SE.

This analysis could be a vignette to demonstrate just how important it is to properly specify the variants.