EBI-Metagenomics / ebi-metagenomics-cwl

This repository contains the CWL description of the EBI Metagenomics pipeline
21 stars 12 forks source link

Taxonomic annotation is not consistent in metadata file #85

Closed fplazaonate closed 2 years ago

fplazaonate commented 2 years ago

Hello,

I have noticed that taxonomic annotation is not consistent between genomes assigned to the same species representatives.

Below is my code:

library(tidyverse)
genomes_all_metadata=read_tsv('https://ftp.ebi.ac.uk/pub/databases/metagenomics/mgnify_genomes/human-gut/v2.0/genomes-all_metadata.tsv')

genomes_all_metadata=genomes_all_metadata %>%
  select(Species_rep, Lineage) %>%
  group_by(Species_rep,Lineage) %>% summarise(num_genomes=n())

genomes_all_metadata = genomes_all_metadata %>% 
  group_by(Species_rep) %>%
  filter(n()>1)

For instance, genomes assigned to MGYG000002478 are sometimes classified as Phocaeicola dorei and sometimes as _BacteroidesB dorei

Could you fix this?

Florian

fplazaonate commented 2 years ago

My bad! I thought I was in the EBI-Metagenomics /genomes-pipeline repository.