DASL-Lab / provoc

PROportions of Variants of Concern using counts, coverage, and a variant matrix.
https://dasl-lab.github.io/provoc/
MIT License
0 stars 0 forks source link

Identical variants after combining with data #22

Closed DBecker7 closed 4 months ago

DBecker7 commented 5 months ago

Suppose V1 has mutations m1, m2, and m3, and V2 has variants m2, m3, and m4. If m1 and m4 are not in the data, then these two variants become identical. This will cause lack of convergence in the numerical optimization routine (the predictor matrix is singular).

We need a function that checks for this after the variant matrix is combined with the data (after the fuse() step), and possibly fixes the issue by removing one of the variants or re-naming them.

Renaming variants might cause problems later down the line. Not sure if we should account for this down the line or avoid it.