There are two things that can be changed in the variant matrix:
Add or remove variants.
Change the definitions of the variants (include or exclude some mutations).
I'm mainly interested in 1, but some work should be done for 2 as well.
This should take the form of a vignette that repeatedly re-fits the same model but with random subsets of the variant matrix to see how the estimates change. This is essentially investigating multicollinearity of the predictors, but the predictors are binary so this is tricky. This can also be framed as a study of variable importance. A second section of the vignette might randomly toggle a few entries of the variant matrix to see how important individual mutations are to a given variant.
Ideally, the results of this could result in new variable importance measure functions that could be useful for models that are not part of provoc.
There are two things that can be changed in the variant matrix:
I'm mainly interested in 1, but some work should be done for 2 as well.
This should take the form of a vignette that repeatedly re-fits the same model but with random subsets of the variant matrix to see how the estimates change. This is essentially investigating multicollinearity of the predictors, but the predictors are binary so this is tricky. This can also be framed as a study of variable importance. A second section of the vignette might randomly toggle a few entries of the variant matrix to see how important individual mutations are to a given variant.
Ideally, the results of this could result in new variable importance measure functions that could be useful for models that are not part of
provoc
.