Splitting up reconciliation step into multiple runs

davidemms / OrthoFinder

Phylogenetic orthology inference for comparative genomics

GNU General Public License v3.0

692 stars 187 forks source link

I'm running OrthoFinder 2.2.7 on about 400 bacterial genomes. It has been working well, but the final step (starting with a species tree and orthogroups using the -fg and -s option) only gets through about 1000 out of 30000 orthogroups in 96 hours when running on 4 cores (-a 4) with 62 GB of memory.

96 hours is the longest time allowed for the standard queue on HPC system I'm using. While there are options for longer run times, I'm reluctant to commit to a potentially very long batch job if there is a way to break the work up into multiple jobs.

So, I'm wondering if there is a way to break the orthogroups into several sets and run the reconciliation algorithm on each set separately?

I'm using the dendroblast method, not using multiple alignments, and gene tree construction seems to take less than 24 hours. I am also using the binary version of OrthoFinder and running it on CentOS 7.5.

Thank you.

davidemms / OrthoFinder

Splitting up reconciliation step into multiple runs #241