Open ZoePochon opened 2 years ago
Same problem arising for Hepatitis B virus. A lot of good references for it are ignored because they are designated as "no rank" in the NCBI database like HBV Genotype A, HBV Genotype B, HBV Genotype C and so on. It therefore doesn't have many good references to work with.
So, I've been digging a bit about this no rank issue and I think the most straightforward way would be to change "no rank" to "species" directly in the database for the pathogens we are interested in because including all no rank species would just give us too much filtering work after.
Agree @ZoePochon, I would not include all "no rank" taxa for following up steps, this will be too much rubbish. Perhaps if you have a list of interesting microbes that for some reason have "no rank" taxonomic level in our DB, it would be best to solve this ad-hoc.
Human parvovirus B19, Influenza A virus, and many viruses are classified as "no rank" in the NCBI database. This is problematic because we filter the KrakenUniq output to only keep the "species" level, excluding de facto the viruses from "no rank". I would suggest to keep "no rank" in the pipeline all along because I imagine it to be trickier to custom such a large database. What do you think ?
Human parvovirus B19: https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=10798&lvl=3&lin=f&keep=1&srchmode=1&unlock Influenza A virus: https://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?mode=Info&id=382835&lvl=3&lin=f&keep=1&srchmode=1&unlock