ComparativeGenomicsToolkit / cactus

Official home of genome aligner based upon notion of Cactus graphs
Other
481 stars 106 forks source link

Question about ancestral genome length too short #1329

Closed qiqizhang01 closed 2 months ago

qiqizhang01 commented 3 months ago

Dear author,

We are currently utilizing Progressive Cactus to align 25 genomes from various species. However, when I used halStats to generate the ancestral genome output, I observed that the resulting ancestral sequence is too short. For instance, in the image provided, Anc05 is only approximately 68Mb in length, whereas the 16 species within its branch belong to the same family, with genome sizes ranging from 200-900Mb.

I have thoroughly reviewed the cactus logs from my run and no errors were reported. Is this outcome considered normal? Could it possibly be linked to the outgroup species (9) that I selected?

Below is the code I used:

cactus evolverjobs evolverjobs.txt evolverjobs.hal --realTimeLogging true

halStats evolverjobs.hal > evolverjobs_halStats.txt

Best regards,

WechatIMG1860

glennhickey commented 3 months ago

I would use halCoverage rather than the ancestor sizes to check the alignment. But I agree they do seem small. Are these genomes hard masked? What's the branch length between them?

qiqizhang01 commented 3 months ago

1)The genomes are soft masked, but I used EDTA to mask TE sequences. I'm not sure if this is why the ancestral sequence is too short. I will try using RepeatMasker for masking.
2)The branch lengths between these species (including outgroup species) ranged from 0.01 to 0.19, with the branch lengths ranging from 0.01 to 0.12 for ingroup species.

image

halCoverage results are as follows:

image