EBI-Metagenomics / genomes-catalogue-pipeline

MGnify genome analysis pipeline
Other
97 stars 21 forks source link

Taxonomic annotation is not consistent in metadata file #15

Closed fplazaonate closed 2 years ago

fplazaonate commented 2 years ago

Hello,

I have noticed that taxonomic annotation is not consistent between genomes assigned to the same species representatives.

Below is my code:

library(tidyverse)
genomes_all_metadata=read_tsv('https://ftp.ebi.ac.uk/pub/databases/metagenomics/mgnify_genomes/human-gut/v2.0/genomes-all_metadata.tsv')

genomes_all_metadata=genomes_all_metadata %>%
  select(Species_rep, Lineage) %>%
  group_by(Species_rep,Lineage) %>% summarise(num_genomes=n())

genomes_all_metadata = genomes_all_metadata %>% 
  group_by(Species_rep) %>%
  filter(n()>1)

For instance, genomes assigned to MGYG000002478 are sometimes classified as Phocaeicola dorei and sometimes as _BacteroidesB dorei

Could you fix this?

Florian

tgurbich commented 2 years ago

Hi Florian,

Thank you for spotting this! The inconsistency was due to the taxonomy for strains from version 1.0 of the catalog not being updated with the same version of GTDB that was used for the newly added genomes (r202). This is now resolved and all strains have an updated taxonomy. The corrected metadata table is available on our FTP site: http://ftp.ebi.ac.uk/pub/databases/metagenomics/mgnify_genomes/human-gut/v2.0/genomes-all_metadata.tsv

Sorry about this and, again, thank you for reporting the error.

Best, Tatiana

fplazaonate commented 2 years ago

Thank you Tatiana for fixing this! Best, Florian