donovan-h-parks / PhyloRank

Assign taxonomic ranks based on evolutionary divergence.
GNU General Public License v3.0
21 stars 4 forks source link

Define how many new families I have #8

Open SilasK opened 4 years ago

SilasK commented 4 years ago

I have a bunch of MAGs which I annotated using GTDB-tk and I also build a tree based on the markers.

If I have for example 3 genomes that are not annotated at family level but belong to the same order. Can I use phylorank to show that these genomes belong to 1,2 or three new families?

donovan-h-parks commented 4 years ago

Yes. Does your tree span an entire domain? Establishing ranks requires comparing the relative evolutionary divergence (RED) between taxa at the same defined rank (i.e. it is a measure relative to your specific tree and not an absolute measure). Assuming you have this, you need to first decorate your tree and then calculate the RED values as described in the README. We are hoping to provide a better solution in the future, but this is ongoing work.

SilasK commented 4 years ago

Thank you for your reply.

Yes my tree spans several phyla. I decorated the tree and predicted outliers as described in the Readme. Can you explain me how I get the answer to my question. It is not completely clear to me.

donovan-h-parks commented 4 years ago

It requires some manual curation. You need to annotate your tree with the families you suspect. You can then inspect the output of the outlier command to see if these families have an RED value that is similar to other families. I appreciate this isn't an ideal solution for your situation, but PhyloRank isn't really meant to address this direct problem.

fujch7 commented 4 years ago

Yes. Does your tree span an entire domain? Establishing ranks requires comparing the relative evolutionary divergence (RED) between taxa at the same defined rank (i.e. it is a measure relative to your specific tree and not an absolute measure). Assuming you have this, you need to first decorate your tree and then calculate the RED values as described in the README. We are hoping to provide a better solution in the future, but this is ongoing work.

Is that means, for example, if there are only 30 species in a family, but the genera classification of all these species is not clear, then we can not use phylorank to calculate RED and classify them at the genus level, because Phylorank needs some established and closely relatived genera as reference?

donovan-h-parks commented 4 years ago

You need a sensible calibration point to determine suitable RED values for defining a genus. GTDB does this by calculating the median RED value of all well-defined bacterial or archaeal genera. Other approaches are certainly possible, but I haven't explored these in detail. Ultimately, I hope to incorporate an approach for resolving this issue into GTDB-Tk, but this is still in development.

SilasK commented 4 years ago

Hey I've ssen you updated the GTDBtk, does the https://github.com/Ecogenomics/GTDBTk/pull/244 solve this issue?

donovan-h-parks commented 4 years ago

It aims to help answer such questions though manual inspection and decision making is still required.

SilasK commented 3 years ago

Hallo, I managed to run the gtdbtk infer_ranks on the tree including the ref and my genomes. If I understand it correctly it puts the RED values on the tree.

I tried to open it with ete3 (python) but it didn’t understood the format. Could you point me to a tool to visualise an analyse the generated tree in order to do the manual curation?

donovan-h-parks commented 3 years ago

Hi. You can visualize the output using Dendroscope.