bioforensics / MicroHapulator

Tools for empirical microhaplotype calling, forensic interpretation, and simulation.
https://microhapulator.readthedocs.io/
Other
6 stars 1 forks source link

Perform variant calling at non-target positions #145

Open standage opened 2 years ago

standage commented 2 years ago

At the moment, the core haplotype calling algorithm considers only a collection of explicitly designated SNPs. But there is often rare/cryptic variation at non-target sites within the locus. This thread is a placeholder and a reminder to come back at some time in the future and implement features for calling SNPs (or maybe even small indels) at all sites in the reference.

Some folks argue that variant calls or perhaps even the entire locus sequence are the MH alleles of the future. I’m sympathetic on a philosophical level, but there are practical obstacles to actualizing that glorious future. What I propose here would be an incremental step in that direction, providing complete backwards and forwards compatibility for markers whose SNP definitions may change over time, but providing data to begin experimenting with comprehensive variant call sets at each locus.

For now, we’ll probably want to store variant calls separately from the MH allele tallies, along with the marker reference sequence.

standage commented 2 years ago

Experimented a bit today, and the following seems to be a reasonable starting point.

This would create a set of "de novo variants", which could be combined with user-supplied "reference variants" to specify the final "marker definitions" to be used by mhpl8r type for haplotype calling.

standage commented 2 years ago

Should investigate variant filtering.