Open brendane opened 5 years ago
Hi Brendan
It's a good question, I've had trouble with this one myself in the past too. I've not tackled it yet as, for your 400 genomes, Orthofinder will be writing out 400x400 = 160,000 orthologues files. And at the same time each orthogroup (which would be parallelised over) could have pairs of orthologues from each species pair, meaning each task would have to write to each file! It's not insurmountable but will involve careful managing of inter-dependent parallel tasks. It's a good reminder though, I'll have a look and see how much work it'd be.
All the best David
I'm running OrthoFinder 2.2.7 on about 400 bacterial genomes. It has been working well, but the final step (starting with a species tree and orthogroups using the -fg and -s option) only gets through about 1000 out of 30000 orthogroups in 96 hours when running on 4 cores (-a 4) with 62 GB of memory.
96 hours is the longest time allowed for the standard queue on HPC system I'm using. While there are options for longer run times, I'm reluctant to commit to a potentially very long batch job if there is a way to break the work up into multiple jobs.
So, I'm wondering if there is a way to break the orthogroups into several sets and run the reconciliation algorithm on each set separately?
I'm using the dendroblast method, not using multiple alignments, and gene tree construction seems to take less than 24 hours. I am also using the binary version of OrthoFinder and running it on CentOS 7.5.
Thank you.