Ecogenomics / GTDBTk

GTDB-Tk: a toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes.
https://ecogenomics.github.io/GTDBTk/
GNU General Public License v3.0
464 stars 82 forks source link

relative evolutionary divergence (RED) meaning #152

Closed ucassee closed 5 years ago

ucassee commented 5 years ago

Hi developer, I want to learn more about relative evolutionary divergence (RED) including how it is calculated and what it is based on? Could you please give me more information? Thanks in advance

donovan-h-parks commented 5 years ago

We discuss RED in the GTDB manuscript: https://www.ncbi.nlm.nih.gov/pubmed/30148503

ucassee commented 5 years ago

I find red value of some bins are approximately 0.35. And those bins are just annotated in phylum level (dBacteria;pMarinisomatota;c;o;f;g;s__) in gtdb. I want to know whether such a low red value can indicate these bins belong to a new phylum.

pchaumeil commented 5 years ago

Following GTDB-Tk rules those bins are in the phylum Marinisomatota because their RED values (0.35) bring the RED value of p__Marinisomatota (0.385) closer to median phylum-level RED value ( 0.345).

You bins will become the most basal members of Marinisomatota.

To verify if your bins are part of a new phylum, You would need to generate a de novo bootstrapped tree and look at the support of the decorated nodes for the Marinisomatota branch.

ucassee commented 5 years ago

@pchaumeil Thanks for your reply. So do you mean if the RED values below the median phylum-level RED value ( 0.345), these bins are likely to be a new phlyum ? I am not sure how should I run de novo bootstrapped tree ? gtdbtk de_novo_wf --genome_dir Marinisomatota_dir --bac120_ms --outgroup_taxon p__Chloroflexota --taxa_filter p__Marinisomatota --out_dir de_novo_output like this ?

pchaumeil commented 5 years ago

So do you mean if the RED values below the median phylum-level RED value ( 0.345), these bins are likely to be a new phylum ? It will depends on the branch the bins are on. If your bins have a RED values of 0.33 and are placed on the parent branch of pMarinisomatota (0.385) . They will still be considered as pMarinisomatota because they are bringing the RED value of pMarinisomatota closer to median phylum-level RED value ( 0.345). But if your bins have a RED values of 0.33 and are placed on the parent branch of pPatescibacteria(0.341),They will be considered as a new phylum because, otherwise, they would bring the RED value of p__Patescibacteria farther to median phylum-level RED value ( 0.345).

GTDB-Tk doesn't generate bootstrapped trees. So, you will have to get the MSA generated from GTDB=Tk and generate the bootstrapped tree with your preferred phylogenetic tree construction software.

ucassee commented 5 years ago

Hi @pchaumeil . The gtdbtk.bac120.msa.fasta file contains 23470 sequences. Should I use all of them to generate the bootstrapped tree? I guess it may take a long time to do it . The following is the gdtbtk classification result:

bin.25 dBacteria;pMarinisomatota;c;o;f;g;s N/A N/A N/A N/A N/A N/A N/A N/A N/A dBacteria;;c;o;f;g;s Placement taxonomic novelty determined using RED N/A 72.94 11 0.341112511073 N/A bin.90 dBacteria;pMarinisomatota;c;o;f;g;s N/A N/A N/A N/A N/A N/A N/A N/A N/A dBacteria;;c;o;f;g;s Placement taxonomic novelty determined using RED N/A 94.4 11 0.339904346943 N/A bin.1 dBacteria;pMarinisomatota;c;o;f;g;s N/A N/A N/A N/A N/A N/A N/A N/A N/A dBacteria;;c;o;f;g;s Placement taxonomic novelty determined using RED N/A 94.74 11 0.338366749158 N/A bin.3 dBacteria;pMarinisomatota;c;o;f;g;s N/A N/A N/A N/A N/A N/A N/A N/A N/A dBacteria;;c;o;f;g;s Placement taxonomic novelty determined using RED N/A 83.27 11 0.336853438411 N/A

Do you mean the key point to determine whether they belong to new phylum is depend on whether they cluster with Marinisomatota on reference tree? If they are closer to other phlyum other than reference Marinisomatota on the tree, they are likely to be a new phylum.

pchaumeil commented 5 years ago

For a potential new phylum, we would recommend using the full MSA. As a pre-screening, you could pick one representative per order ( or family) to create a bootstrapped tree.

To make sure genomes are part of a new phylum , We would create bootstrapped tree using different models and different sets of markers. You also need to take into consideration other characteristicsfor your bins like completeness,contamination,quality.....

ucassee commented 5 years ago

Hi @pchaumeil.
I got it! Thanks for your patience!

pchaumeil commented 5 years ago

No worries, good luck with your research!