david-barnett / microViz

R package for microbiome data visualization and statistics. Uses phyloseq, vegan and the tidyverse. Docker image available.
https://david-barnett.github.io/microViz/
GNU General Public License v3.0
94 stars 10 forks source link

comp_barplot seem to have issues with domain being unknown #147

Closed NeginValizadegan closed 1 month ago

NeginValizadegan commented 3 months ago

Hello,

I have an ASV where all levels are unknown as following:

Taxonomy Table: [1 taxa by 7 taxonomic ranks]: Domain Phylum Class Order Family Genus Species 55b5947908a70f952516e813b1ca9e71 "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified" "Unclassified"

Now, when using the following to create a barplot:

comp_barplot(physeq.plot, "Phylum", n_taxa = 20, label = "group")

I get the following error: `Error: Taxa cannot be aggregated at rank: Phylum See last message for convergent taxa rows.

To fix the problem, try: yourData %>% tax_fix(unknowns = c("Unclassified"))

Try tax_fix_interactive() to find and fix further problems`

When I use yourData %>% tax_fix(unknowns = c("Unclassified")) to fix it, now it will make genus and species NAs and instead of putting the unknown in it's own group or others, it will show it by the ASV label 55b5947908a70f952516e813b1ca9e71 in the barplot. I think this is a bug.

Screenshot 2024-04-01 at 3 09 52 PM

david-barnett commented 3 months ago

Hi Negin

I think this is expected behaviour, not a bug. You have a completely unclassified sequence, with a long ID code as a taxon name.

If it could not be classified even to a domain, then either you've discovered something entirely novel, or its more likely chimeric or from substantial sequencing error or some other problem.

If you want a completely unclassified sequence to remain in your dataset, but with a better name you can manually rename it. e.g. something like taxa_names(your_phyloseq) <- gsub("long unwanted ID string", "Mystery_Taxon", taxa_names(your_phyloseq))

but most likely it is better to remove it from your dataset. it's probably very rare (but check this) and you can remove it with either your_phyloseq %>% tax_select("long_id_string", deselect = TRUE) or if it is e.g. only an error present in one sample you could remove it as part of filtering all very rare sequences with e.g. tax_filter(min_prevalence = 2)

best David

david-barnett commented 1 month ago

assuming this issue is resolved, feel free to open new issue if not :)