TheBrownLab / PhyloFisher

PhyloFisher is a software package written in Python3 that can be used for the creation, analysis, and visualization of phylogenomic datasets that consist of eukaryotic protein sequences.
MIT License
31 stars 15 forks source link

Running PhyloFisher across multiple nodes #110

Closed matiasWanntorp closed 9 months ago

matiasWanntorp commented 9 months ago

Hello! I was wondering whether PhyloFisher supports parallell execution using MPI? Or whether there is some other way of running sgt_constructor.py across multiple nodes at once? Just using one node, with 20 cpus, takes a very long time with the dataset I have.

All the best and thank you ahead of time!

atice commented 9 months ago

Hi @matiasWanntorp,

Unfortunately MPI is not supported in PhyloFisher. One strategy you could take is to break up the output of working_dataset_constructor.py into separate directories. So if each node has 20 cpus you could make 12 directories with 20 genes each. Then, use each of the 12 directories as input for independent runs of sgt_constructor.py. When all runs of sgt_constructor.py finish, merge all sgt_constructorout<M.D.Y>/trees directories into one directory and use this for the input of forest.py. Let me know if anything is unclear or if you need help with anything else. Thank you for considering PhyloFisher for your research.

Alex

matiasWanntorp commented 9 months ago

Hello again! Thank you for your answer, it makes a lot of sense!