Open apriha opened 6 years ago
Consider integrating https://github.com/poruloh/Eagle
Consider integrating https://github.com/poruloh/Eagle
This only seems useful if no familial DNA is available - IBD gives a much more conclusive result for phasing than statistical methods.
@ebacherdom, I agree. Like discussed above, I think using the results of find_shared_dna
would help with this, especially when more comparisons of individuals in a family group are available. Formally, I think this is a constraint satisfaction problem.
Combine techniques identified by Whit Athey in Phasing the Chromosomes of a Family Group When One Parent is Missing and the results of
find_shared_dna
to reconstruct genomes of maternal and/or paternal ancestors.This can be approached as a constraint satisfaction problem. For example, the algorithm could be provided several individuals, with the maternal and/or paternal relationships also identified (e.g.,
siblings = [ind1, ind2]; mother = [ind3]; paternal_relation = [ind4]
). Then, shared DNA could be discovered byfind_shared_dna
between all combinations of individuals. This information - whether the various combinations of individuals share one chromosome, both chromosomes, or no chromosomes for a given SNP position - would serve as the constraints for reconstructing the ancestral genomes.As a simple example, say two siblings have genotypes of
CA
andAG
at a given SNP. If one knew they shared one chromosome at that location,AN
could be attributed to one parent, andCG
to the other, whereN
would be any allele. Additional comparisons between other individuals could further narrow the solution space for the ancestral genomes.