apriha / lineage

tools for genetic genealogy and the analysis of consumer DNA test results
MIT License
155 stars 25 forks source link

Add ability to reconstruct genomes #2

Open apriha opened 6 years ago

apriha commented 6 years ago

Combine techniques identified by Whit Athey in Phasing the Chromosomes of a Family Group When One Parent is Missing and the results of find_shared_dna to reconstruct genomes of maternal and/or paternal ancestors.

This can be approached as a constraint satisfaction problem. For example, the algorithm could be provided several individuals, with the maternal and/or paternal relationships also identified (e.g., siblings = [ind1, ind2]; mother = [ind3]; paternal_relation = [ind4]). Then, shared DNA could be discovered by find_shared_dna between all combinations of individuals. This information - whether the various combinations of individuals share one chromosome, both chromosomes, or no chromosomes for a given SNP position - would serve as the constraints for reconstructing the ancestral genomes.

As a simple example, say two siblings have genotypes of CA and AG at a given SNP. If one knew they shared one chromosome at that location, AN could be attributed to one parent, and CG to the other, where N would be any allele. Additional comparisons between other individuals could further narrow the solution space for the ancestral genomes.

apriha commented 6 years ago

Consider integrating https://github.com/poruloh/Eagle

ebacherdom commented 5 years ago

Consider integrating https://github.com/poruloh/Eagle

This only seems useful if no familial DNA is available - IBD gives a much more conclusive result for phasing than statistical methods.

apriha commented 5 years ago

@ebacherdom, I agree. Like discussed above, I think using the results of find_shared_dna would help with this, especially when more comparisons of individuals in a family group are available. Formally, I think this is a constraint satisfaction problem.