McTavishLab / physcraper

Welcome to Physcraper’s repository! Automatic gene tree updating using the Open Tree of Life.
https://physcraper.readthedocs.io/en/main/
GNU General Public License v3.0
13 stars 6 forks source link

reviewer 2 - comment 7 - explain polytomy handling in docs #171

Closed LunaSare closed 3 years ago

LunaSare commented 3 years ago

"How does it handle polytomies, are all trees in OToL fully resolved? Furthermore, is it possible to know when there are conflicting trees across studies? This is an OToL question actually, but still it would be good to be adressed in the documentation here."

LunaSare commented 3 years ago

We added a FAQ section on the documentation, where we elaborate on this.

"How does it handle polytomies, are all trees in OToL fully resolved?"

Physcraper's starting trees are not synthetic OpenTrees, which indeed usually contain many polytomies. Physcraper uses the input phylogenies (published phylogenies) that were used to synthesize the OpenTree, and these generally have less polytomies. Even if the starting tree has polytomies, these should not significantly affect the results of the Physcraper analysis. This is how we address it on the documentation:

The Physcraper starting tree is a phylogeny whose tip labels must have been standardized to the OpenTree Taxonomy (as described in the Introduction section:
[Mapping names to taxa](https://physcraper.readthedocs.io/en/latest/how_to_start.html#updating-your-own-tree-and-alignment)).
Original tip labels of the starting tree must be identical to taxon labels on the starting alignment.
However, not all taxon labels in the alignment have to be present in the tree and
visceversa.

Physcraper makes use of the starting tree in four main ways:
1. to delimit a taxon for the GenBank search (a search taxon),
2. to be used as starting tree for the phylogenetic reconstruction software of choice,
3. to standardize the taxon names from the starting alignment, and
4. to compare the updated phylogenetic relationships with the original ones.

Physcraper does not really "handle" polytomies. The goal of the software is to use the
existing phylogenetic information that has been generated, reviewed, published and curated by experts in the field.

If a starting tree contains polytomies, these can only affect the outcome of the analysis if the starting tree is used for the case (1) delimiting a taxon for the GenBank search.
To delimit the search taxon from the starting tree, a known outgroup is necessary.
The outgroup can be user defined. If the outgroup is not defined by the user, Physcraper will attempt to root the starting tree following the OpenTree Taxonomy.
If succesful, it will take the tip labels from the earliest diverging branch with the least number of tips. These will be used as outgroup. However, if the starting tree has polytomies
around the early diverging branches,
the automatic rooting is problematic and can have multiple solutions.

"is it possible to know when there are conflicting trees across studies?"

Yes, the "conflict tool" is one of the services provided by the OpenTree of Life project, which is implemented as a graphical interface https://tree.opentreeoflife.org/curator/study/view/ot_1843/?tab=trees&tree=Tr112663&conflict=ott on the OpenTree website. Physcraper uses the API of the conflict tool to generate a summary of conflict between the starting tree and the Physcraper updated tree by default, or between any pair of user defined trees.