fedarko / strainFlye

Pipeline for analyzing (rare) mutations in metagenome-assembled genomes
BSD 3-Clause "New" or "Revised" License
8 stars 1 forks source link

Generalize to multi-allelic mutations #28

Open fedarko opened 2 years ago

fedarko commented 2 years ago

Changing this for just a single analysis is relatively easy; changing it for every step of the pipeline, and testing it at every step of the pipeline, will be much more involved.

Long story short, the ideas of a p-mutation or an r-mutation are currently defined in a binary way -- a position either is a mutation (aka "is mutated") or it isn't. But positions can have multiple alternate nucleotides, and ideally we would account for these rather than ignoring them.

It's simple to extend the idea of a p-mutation or an r-mutation to multi-allelic positions (so that a position can have 0, 1, 2, or 3 mutations, rather than 0 or 1 mutations). However, the effects of this change will impact: