ComparativeGenomicsToolkit / cactus

Official home of genome aligner based upon notion of Cactus graphs
Other
505 stars 111 forks source link

On species tree and branch lengths #1337

Open diego-rt opened 5 months ago

diego-rt commented 5 months ago

Hello,

First of all, thanks a lot for your very exciting (and complex) pipeline. I'm really looking forward to seeing the results!

I'm struggling a bit with the species tree though. I would like to generate a cactus whole genome alignment for a number of amniote species and was wondering what is the most effective way to calculate the branch lengths. I followed your discussion on the FAQ and on other issues and I read that you mention that mashtree is not ideal for longer branch lengths. Do you have any advice on how to calculate the branch lengths for i.e. an alignment spanning from mammals to birds like the one in your 2020 publication?

I have an orthofinder-derived species tree for my species, is it possible to use these distances (in subs/site but calculated from a concatenated MSA of single copy genes)? Or do the distances have to be derived from neutrally evolving (i.e. 4-fold degenerate) sites? Would mashtree be preferable in that case?

I would like to prioritise the accuracy of the genome reconstructions (particularly at intergenic regions) so I'm happy to hear whatever you think would lead to the best outcome.

Thanks a lot!

glennhickey commented 5 months ago

You an try PhyloFit on 4-fold degenerate sites as described here: https://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg38&g=cons241way

I know Siavash (and collaborators) have a new new tool coming out soon to make this easier, and I'll put a link on the cactus page once it's released.