ComparativeGenomicsToolkit / cactus

Official home of genome aligner based upon notion of Cactus graphs
Other
481 stars 106 forks source link

Adjusting config.xml to allow multifurcations #1396

Closed emistasis closed 1 month ago

emistasis commented 1 month ago

Hello!

I'm currently trying to run Cactus/2.8.2 using my university's HPC SLURM, therefore I needed to create an image file for Cactus using Singularity. Right now, I'm currently facing an issue where my guide tree has too many multi-furcations (it's a 3 clade polytomy) as I'm receiving the following error: Node Anc15 has more than two children: ['Anc21', 'Anc22', 'Anc23']. Such nodes have been shown to drastically drop coverage in recent versions of Cactus. For best results, binarize your tree and try again. You can override this check by toggling "allow_multifurcations" to "1" in the configuration XML

I'm currently in the process of trying to bind a modified config.xml file that has allow_mutilfurcations="1", but I'm not sure if my command or even the path for the image is correct: singularity exec --bind /scratch/user/emmarie.alexander/final_cactus_run/cactus/mod_config.xml:/cactus-2.8.2/src/cactus/cactus_progressive_config.xml cactus.img cactus ./jobstore cactus_run9_spp.txt cactus_run9_final.hal --consCores 1

I'd appreciate some guidance on how to move forward with this issue as I would prefer not to drop anything from the multifurcation, but I'm not sure that trying to bind a new config file will work in the first place.

glennhickey commented 1 month ago

As the error message says, you will likely get far better results by making your tree binary - ie choosing two clades from your polytomy and adding an internal node to group them together.

But to answer your question, you can take the config out of the latest release and modify it

wget -q https://github.com/ComparativeGenomicsToolkit/cactus/releases/download/v2.8.2/cactus-bin-v2.8.2.tar.gz
tar zxf cactus-bin-v2.8.2.tar.gz cactus-bin-v2.8.2/src/cactus/cactus_progressive_config.xml
sed -e 's/allow_multifurcations="0"/allow_multifurcations="1"/g' cactus-bin-v2.8.2/src/cactus/cactus_progressive_config.xml > mod_config.xml

then run cactus with --configFile mod_config.xml. Again, you probably shouldn't do this because

Such nodes have been shown to drastically drop coverage in recent versions of Cactus.

emistasis commented 1 month ago

Thank you so much, this is super helpful!

I tried modifying the XML because I wasn't sure if the internal node needed to be a biologically read node (in the form of adding another species) or if I could use something like "NodeA" to signify that there in arbitrary internal node. I tried including a "NodeA" this time, but I got an error saying that Cactus failed to parse the newick tree, although I was able to visualize the same newick tree in FigTree prior to running Cactus.

This is what my polytomy structure is: ((Sciurus_carolinensis:0.132269,Muscardinus_avellanarius:0.174474):0.0174055,(Heterocephalus_glaber:0.0750591,Cavia_porcellus:0.129455):0.102124,((Rattus_norvegicus:0.074297,Mus_musculus:0.068167):0.0650601,Chionomys_nivalis:0.119745):0.193741):0.0294113):0.00845075

This is what I've adjusted it to be: (((Sciurus_carolinensis:0.132269,Muscardinus_avellanarius:0.174474)NodeA:0.0174055),(Heterocephalus_glaber:0.0750591,Cavia_porcellus:0.129455)NodeA:0.102124),((Rattus_norvegicus:0.074297,Mus_musculus:0.068167):0.0650601,Chionomys_nivalis:0.119745):0.193741):0.0294113):0.00845075

glennhickey commented 1 month ago

You can't have two NodeAs in your tree. I recommend just adding the brackets and letting Cactus come up with its own names for the ancestral nodes.

emistasis commented 1 month ago

Ah, understood! That fixed it. Thanks so much for all your help! :-)