davidemms / OrthoFinder

Phylogenetic orthology inference for comparative genomics
https://davidemms.github.io/
GNU General Public License v3.0
693 stars 188 forks source link

Can OrthoFinder handle tens of thousands of genomes? #668

Open aziele opened 2 years ago

aziele commented 2 years ago

OrthoFinder needs BLAST/Diamond output as multiple pairwise comparison files stored in a single directory. This is problematic for a large number of genomes - for 50000 genomes this would result in 2.5 billion files.

Is there an alternative way to provide BLAST output to OrthoFinder? For example, is it possible to limit the number of BLAST files to the number of genomes (i.e. one proteome against a single database containing proteins from all the proteomes).

davidemms commented 2 years ago

Hi aziele

This currently isn't possible, if you were to resolve the problem of separate pairwise files then there are other scaling issues linked to computation requirements (RAM, runtime) that would still prevent it from carrying out such a large scale analysis. It is possible that I will be able to do something this year to allow scaling to any number of species, but I can't offer anything at the moment.

Best wishes David