genepi / haplocheck

Phylogeny-based Contamination Detection in Mitochondrial and Whole-Genome Sequencing Studies
http://mitoverse.i-med.ac.at/
MIT License
16 stars 2 forks source link

Can Haplocheck be used to detect contamination in non-human species data? #18

Open maruiqi0710 opened 1 year ago

maruiqi0710 commented 1 year ago

I have already sequenced a batch of yeast data and would like to use Haplocheck to detect contamination in the WGS data. I note that the examples mentioned in Contamination detection in sequencing studies using the mitochondrial phylogeny are all human sequencing data, and the software is based on Phylotree. I would like to know whether haplocheck can be used for other non-human species. Thanks.

haansi commented 1 year ago

basically it would be possible, as we now updated the underlying haplogrep to version 3 - which allows other phylogenetic trees than the human one. However we would need a recent mt-phylogeny from yeast / mouse /... that comes in the form of a mutation annotated tree, with haplogroups / clades as identifier. So it's not straight-forward, but theoretically possible. In summary we would need the according phylogenetic tree similar represented as Phylotree - here an example: http://phylotree.org/tree/A.htm - then we need the reference sequence and its annotations in a gff3 file, could update haplogrep to work with this new tree and integrate it in haplocheck.

maruiqi0710 commented 1 year ago

Thanks for your reply. Can you provide detailed tutorials for beginners? I noticed the tutorial in https://haplogrep.readthedocs.io/en/latest/trees, but the description is too brief. I noticed an example of https://github.com/genepi/phylotree-rcrs-17/tree/main/src. I only know the files generated by bwa index and the *.dict file (generated by gatk CreateSequenceDictionary). But I don’t know how to create other files, such as a yaml file, rules.csv, tree.xml, weights.txt and annotations folder that contains so much information in https://github.com/genepi/phylotree-rcrs-17/tree/main/src. Another important thing is how to integrate the results of Haplogrep 3 into Haplocheck?