Ecogenomics / GTDBTk

GTDB-Tk: a toolkit for assigning objective taxonomic classifications to bacterial and archaeal genomes.
https://ecogenomics.github.io/GTDBTk/
GNU General Public License v3.0
479 stars 82 forks source link

ANI values for the MAGs #599

Open saras224 opened 3 months ago

saras224 commented 3 months ago

Hi @pchaumeil I have a confusion regarding the ANI values that the GTDB-tk gives for the classification of the MAGs.

  1. which one should I consider the ANI values of the MAGs with the references; fastani_ani or closest_placement_ani?
  2. why does gtdbtk not assigning ANI values to all the MAGs and giving NA instead? is there any cut-off after which the gtdbtk tool does not give the ANI value for the MAGs and gives NA?
  3. This one is a general question: if the ANI match with the reference is ~70% then to which classification level it is similar (family or order?
  4. if a MAG is novel and it has to be called as new phyla then what should be the ANI match for it?

Hope you understand what my confusion is and you would clarify the doubts.

Thanks in Advance Saraswati Awasthi

saras224 commented 2 months ago

hi @wwood can you help me with this?

thanks

pchaumeil commented 2 months ago

Hello, Please see my notes below:

  1. which one should I consider the ANI values of the MAGs with the references; fastani_ani or closest_placement_ani? _fastani_ani should return the closest representative based on ANI alone ( the comparison is run against all reps); closest_placementani is only run against one genome only ( the closest genome in the pplacer tree

  2. why does gtdbtk not assigning ANI values to all the MAGs and giving NA instead? is there any cut-off after which the gtdbtk tool does not give the ANI value for the MAGs and gives NA? Some genomes are too novel to have informative ANI values against existing GTDB representatives, This is the case for novel order,class even family. Tk does not return ANI if the values are too low ( <80%) or the user genomes are placed above genus rank in the reference tree

  3. This one is a general question: if the ANI match with the reference is ~70% then to which classification level it is similar (family or order)? We do not recommend using ANI for anything else than species clustering. You can use AAI(or POCP) for genus delineation.

  4. if a MAG is novel and it has to be called as new phyla then what should be the ANI match for it? If a MAG is flag as a new Phylum, ANI should not be taken into account. It requires further investigation using different methods( de novo tree for example)

Hope that helps Regards, Pierre