AstrobioMike / GToTree

A user-friendly workflow for phylogenomics
GNU General Public License v3.0
197 stars 25 forks source link

FastTreeMP for multithreading #33

Closed hyphaltip closed 3 years ago

hyphaltip commented 3 years ago

Seems like if you request multiple processors the FastTree step should try to use FastTreeMP so it can gain speedup for the tree step?

AstrobioMike commented 3 years ago

Hey there, Jason :)

That’s a good suggestion/point. When first putting things together I initially decided against it just because parallel fasttree isn’t deterministic, and the time for that is generally much less than for the alignment step anyway. But it’s still time, and given the nature of a broad-level workflow like this anyway, the whole not-being-deterministic thing shouldn’t really ever be an issue anyway. I’m gonna add it in to be the default, thanks for the note about it!

hyphaltip commented 3 years ago

thanks Mike- maybe the cmdline option for tree option could you let one specify 'FastTreeMP' in method instead of 'FastTree' and that way they have option to use the deterministic if they felt necessary? I was using this as a screening tool to decide datasets to remove before going to a final version of the tree anyways, and likely we will probably do individual gene trees + coalescent or some partitioned analysis if we want to get deeper into testing species relationships too...

AstrobioMike commented 3 years ago

I completely agree given the nature of a broad workflow like this, we're not operating at the level of resolution where this kind of thing would matter (any potential differences between FastTree and FastTreeMP).

And yea, good idea. I added it as an option to the -T (tree program) command-line argument, and made it the default (when it's present on a system), otherwise still defaults to FastTree (seems it's packaged with the linux conda install, but not the mac one ¯_(ツ)_/¯).

This is updated as of v1.5.50 (up on conda now).

Thanks again :)