eblerjana / pangenie

Pangenome-based genome inference
MIT License
115 stars 10 forks source link

Phasing #63

Open JosephLalli opened 10 months ago

JosephLalli commented 10 months ago

I understand that it has not been published, and I do not know if it has been benchmarked in any way. But in theory Pangenie offers variant phasing via an hmm-based algorithm, which seems sensible.

I think that, using a handful of common SNPs as anchor points, this feature could be useful for phasing structural variants onto a short variant scaffold generated using more mainstream tools like SHAPEIT or Beagle.

What is the state of this phasing tool? I know Pangenie will generate phased datasets, but are they much better than flipping a coin? Or are they at all accurate?

eblerjana commented 10 months ago

The phasing feature is just some experimental code and has never been benchmarked. What it does is to run the Viterbi algorithm on the HMM underlying PanGenie to produce haplotype paths. But the phasing results are most likely not accurate, because the phasing is only based on a small number of haplotypes, i.e. the panel is way to small to produce any useful results. Furthermore, the implementation always randomly samples 30 paths from the panel to do the phasing, since it becomes to slow otherwise. So I would definitely not recommend using it for any phasing analyzes.