ComparativeGenomicsToolkit / cactus

Official home of genome aligner based upon notion of Cactus graphs
Other
526 stars 111 forks source link

cactus vs multiz #1523

Open evgenyleushkin opened 1 week ago

evgenyleushkin commented 1 week ago

We were curious to do some comparisons of our new cactus alignments with those obtained previously with multiz. In our test group we used 16 bat genomes aligned with hg38. We compared coverage and identity stats (using mafCoverage) and discovered that cactus had ~3% lower stats on average. We were wondering what could be explanations to this? Perhaps hal2maf conversion options (I tried several combinations with more or less similar outcomes). I also used --dupeMode single to avoid self-alignments. Is there something else we are not taking into account here? Or the metrics for comparison is not suitable for some reason?

Screenshot 2024-11-12 at 17 37 40

Thanks a lot for your help! Evgeny

glennhickey commented 3 days ago

Cactus can have lower sensitivity due to the progressive decomposition (information can be lost between distant species in the tree since the alignment has to transitively cross everything in between), as well as the fact that it makes more effort than multiz to make evolutionarily consistent alignments. The first issue is more pronounced for lower-quality (less complete) assemblies.