NBISweden / aMeta

Ancient microbiome snakemake workflow
MIT License
19 stars 14 forks source link

Some interesting microbes are ignored because they are labelled as "no rank" in the ncbi database #60

Open ZoePochon opened 2 years ago

ZoePochon commented 2 years ago

Human parvovirus B19, Influenza A virus, and many viruses are classified as "no rank" in the NCBI database. This is problematic because we filter the KrakenUniq output to only keep the "species" level, excluding de facto the viruses from "no rank". I would suggest to keep "no rank" in the pipeline all along because I imagine it to be trickier to custom such a large database. What do you think ?

Human parvovirus B19: https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=10798&lvl=3&lin=f&keep=1&srchmode=1&unlock Influenza A virus: https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=382835&lvl=3&lin=f&keep=1&srchmode=1&unlock

ZoePochon commented 2 years ago

Same problem arising for Hepatitis B virus. A lot of good references for it are ignored because they are designated as "no rank" in the NCBI database like HBV Genotype A, HBV Genotype B, HBV Genotype C and so on. It therefore doesn't have many good references to work with.

ZoePochon commented 1 year ago

So, I've been digging a bit about this no rank issue and I think the most straightforward way would be to change "no rank" to "species" directly in the database for the pathogens we are interested in because including all no rank species would just give us too much filtering work after.

LeandroRitter commented 1 year ago

Agree @ZoePochon, I would not include all "no rank" taxa for following up steps, this will be too much rubbish. Perhaps if you have a list of interesting microbes that for some reason have "no rank" taxonomic level in our DB, it would be best to solve this ad-hoc.