Open evgenyleushkin opened 1 week ago
Cactus can have lower sensitivity due to the progressive decomposition (information can be lost between distant species in the tree since the alignment has to transitively cross everything in between), as well as the fact that it makes more effort than multiz to make evolutionarily consistent alignments. The first issue is more pronounced for lower-quality (less complete) assemblies.
We were curious to do some comparisons of our new cactus alignments with those obtained previously with multiz. In our test group we used 16 bat genomes aligned with hg38. We compared coverage and identity stats (using mafCoverage) and discovered that cactus had ~3% lower stats on average. We were wondering what could be explanations to this? Perhaps hal2maf conversion options (I tried several combinations with more or less similar outcomes). I also used --dupeMode single to avoid self-alignments. Is there something else we are not taking into account here? Or the metrics for comparison is not suitable for some reason?
Thanks a lot for your help! Evgeny