donovan-h-parks / PhyloRank

Assign taxonomic ranks based on evolutionary divergence.
GNU General Public License v3.0
21 stars 4 forks source link

format of TRUSTED_TAXA_FILE #16

Closed xvazquezc closed 3 years ago

xvazquezc commented 3 years ago

Hi there, Does the TRUSTED_TAXA_FILE have any specific formatting requirements? I've been trying to use that option but I'm not able to make that option work. Contents seem to be always ignored (trusted.txt is a list of phyla). I get something like this:

$ phylorank outliers  -t trusted.txt concat.decorated.treefile concat.decorated.treefile-taxonomy ./outliers2
[2021-05-03 12:02:36] INFO: PhyloRank v0.1.10
[2021-05-03 12:02:36] INFO: phylorank outliers -t trusted.txt concat.decorated.treefile concat.decorated.treefile-taxonomy ./outliers2
[2021-05-03 12:02:36] INFO: Reading tree.
[2021-05-03 12:02:36] INFO: Reading taxonomy.
[2021-05-03 12:02:36] INFO: Reading taxonomy from tree.
[2021-05-03 12:02:36] INFO: Identified 0 taxa for use in inferring RED distribution.

Rank    Taxa to Plot    Taxa for Inference
class   5   0
order   13  0
genus   13  0
family  18  0

[2021-05-03 12:02:36] INFO: Identified 0 phyla.
[2021-05-03 12:02:36] INFO: Using 0 phyla as rootings for inferring distributions.
[2021-05-03 12:02:36] ERROR: Rescaling requires at least 2 valid phyla.

  Controlled exit resulting from an unrecoverable error or warning.

I seem only able to make it run if I use the --fixed_root option. This is not a whole-domain tree but a section of it

donovan-h-parks commented 3 years ago

Hi. The RED value calculated by PhyloRank is the median value of all valid phyla rootings. It appears your tree does not define any phyla. As such, it will only proceed if you indicate the --fxied_root flag which indicates it should calculate RED using the rooting as it exists in your provided tree.

xvazquezc commented 3 years ago

There are actually 9 phyla in the tree, but they only have a single class each. I guess that would be the issue.

As such, it will only proceed if you indicate the --fxied_root flag which indicates it should calculate RED using the rooting as it exists in your provided tree.

Would there be any fundamental difference compared to not using the --fixed_root flag aside of that? I mean in terms of e.g. interpretation and/or reliability of the values in the output

donovan-h-parks commented 3 years ago

Assuming you trust the rooting of your tree, using a fixed root is preferable. The challenge with the GTDB is that there isn't an accepted rooting for the bacterial tree so we make the pragmatic decision to take the medium value over plausible rootings.