UCSC-nanopore-cgl / margin

MIT License
29 stars 5 forks source link

Local phasing correctness #1

Closed jeizenga closed 3 years ago

jeizenga commented 3 years ago

A fully functioning implementation of the local phasing correctness metric I've been talking about. It's implemented as a separate tool called calcLocalPhasingCorrectness, which outputs a table of the LPC values for a grid of length scales.

One kind of strange thing I've noticed is that in the fully global (rho = 1) metric, the values get pulled sharply upward toward 1. I'm pretty sure this is because I count a pair of variants as correctly phased if they are in different phase sets, and on the scale of an entire chromosome most pairs of variants are in different phase sets. This is kind of an undesirable behavior, IMO, but we can discuss later how to approach it.

Another limitation is that I've currently only implemented the version of the metric where the decay is over number of heterozygous variants, rather than bases of sequence. It shouldn't be much more work to extend it to the the sequence-length version now that the basic skeleton is in place and debugged.

It looks like I have some edits from Trevor to the README and to some unit tests, which accidentally got rolled up in this branch.