Arcadia-Science / noveltree

NovelTree is a highly parallelized and computationally efficient phylogenomic workflow that infers gene families, gene family trees, species trees, and gene family evolutionary history.
GNU Affero General Public License v3.0
17 stars 3 forks source link

External Species Tree and AleRax #98

Open Alexggo opened 6 months ago

Alexggo commented 6 months ago

Description of feature

Hello, would it be possible to use an external dated species tree (newick tree, or nexus output) as an input instead of inferring it from the alignment data on SpeciesRax? In addition, generax is an undated model that has a few issues with handling transfers that are not contemporary. Can the pipeline be adjusted to incorporate ALErax to restrict transfers that aren't contemporary?

austinhpatton commented 6 months ago

Hi Alex - Thanks for your interest in NovelTree!

Yes, I think we should certainly be able to implement this in the upcoming release. Out of curiosity, are you anticipating providing a fixed species tree that is already rooted? In the case an unrooted tree is provided, it will likely still be best to internally root it using SpeciesRax.

And you're right that GeneRax unfortunately implements an undated model that, as you say can lead to some unwanted behavior. In the upcoming release, we anticipate including an updated version of GeneRax that should mitigate some of these issues, although it still does not implement a dated model.

As for implementing ALErax, this is something we've been considering, though it's not something we are actively planning on pursuing. Doing so will require some extensive upstream and downstream changes to the workflow, as it requires passing on bootstrapped distributions of trees to the software for the reconciliations - but again, it's something we've been looking into doing!

Out of curiousity - I'd be interested in hearing if you have a sense of the relative computational efficiency of ALErax as compared to GeneRax? One thing we've been working to address is to implement some alternative methods for gene family tree reconciliation that are more computationally efficient, as these tasks are currently one of the major bottlenecks for the workflow.

Thanks again! Austin