DASL-Lab / provoc

PROportions of Variants of Concern using counts, coverage, and a variant matrix.
https://dasl-lab.github.io/provoc/
MIT License
0 stars 0 forks source link

Similarity of Variants (before and after data) #26

Open DBecker7 opened 4 months ago

DBecker7 commented 4 months ago

Jaccard similarity of all pairwise combinations of variants.

Extract the set of mutations for each variant, then compare pairwise across all variants.

There are a couple of interesting cases:

The function should be able to calculate these before data are fused (i.e. with varmat) as well as after (i.e. with the fused data).

DBecker7 commented 4 months ago

Output format should be matrices, similar to something like cov(data). This makes it nice to plot as a heatmap for easy visual inspection.

DBecker7 commented 3 months ago

Some good work has been done. However, the Jaccard similarity is not calculated correctly, and the code could use some clean-up. I also need to make the plotting functions easier to differentiate, or add them to one function with arguments/dispatch to specify which.