gmteunisse / fantaxtic

Fantaxtic - Nested Bar Plots for Phyloseq Data
26 stars 3 forks source link

name_na_taxa function is not assigning higher taxonomic rank to genus or higher rank #38

Closed roopender-bioinfo closed 4 months ago

roopender-bioinfo commented 4 months ago

I used name_na_taxa function to assign all the taxa that are NA to their higher taxonomic rank. However, this function only worked at Species rank. All the NA in species were assigned as "Unknow ", while it didn't work on the NA cells in genus, order, family or any other higher level. If a genus was NA than species was assigned unknown NA, while if genus is known than it will assign species as unknown "name of genus". ps_tmp <- name_na_taxa(top_nested$ps_obj, include_rank = F, na_label = "Unknown ") tax_table(ps_tmp) %>% kable(format = "markdown")

ps_tmp@tax_table Taxonomy Table: [31 taxa by 7 taxonomic ranks]: Kingdom Phylum Class Order
ASV1 "Bacteria" "Firmicutes" "Bacilli" "Bacillales"
ASV10 "Bacteria" "Proteobacteria" "Other Proteobacteria" "Other Proteobacteria" ASV100 "Bacteria" "Firmicutes" "Other Firmicutes" "Other Firmicutes"
ASV10005 "Other" "Other" "Other" "Other"
ASV10010 "Bacteria" "Bacteroidetes" "Other Bacteroidetes" "Other Bacteroidetes"
ASV10021 "Bacteria" "Actinobacteria" "Other Actinobacteria" "Other Actinobacteria" ASV11 "Bacteria" "Proteobacteria" "Gammaproteobacteria" "Betaproteobacteriales" ASV1298 "Bacteria" "Chloroflexi" "Other Chloroflexi" "Other Chloroflexi"
ASV1318 "Bacteria" "Chloroflexi" "Anaerolineae" "Caldilineales"
ASV133 "Bacteria" "Bacteroidetes" "Bacteroidia" "Cytophagales"
ASV17 "Bacteria" "Firmicutes" "Bacilli" "Bacillales"
ASV18 "Bacteria" "Bacteroidetes" "Bacteroidia" "Bacteroidales"
ASV1913 "Bacteria" "Chloroflexi" "Chloroflexia" "Chloroflexales"
ASV2 "Bacteria" "Proteobacteria" "Gammaproteobacteria" "Pseudomonadales"
ASV21 "Bacteria" "Bacteroidetes" "Bacteroidia" "Chitinophagales"
ASV217 "Bacteria" "Bacteroidetes" "Bacteroidia" "Flavobacteriales"
ASV2371 "Bacteria" "Chloroflexi" "KD4-96" "NA"
ASV24 "Bacteria" "Actinobacteria" "Actinobacteria" "Micrococcales"
ASV240 "Bacteria" "Bacteroidetes" "Bacteroidia" "Bacteroidales"
ASV28 "Bacteria" "Actinobacteria" "Actinobacteria" "Propionibacteriales"
ASV3 "Bacteria" "Proteobacteria" "Gammaproteobacteria" "Betaproteobacteriales" ASV3126 "Bacteria" "Chloroflexi" "Chloroflexia" "Chloroflexales"
ASV32 "Bacteria" "Actinobacteria" "Actinobacteria" "Corynebacteriales"
ASV4 "Bacteria" "Proteobacteria" "Alphaproteobacteria" "Sphingomonadales"
ASV6 "Bacteria" "Proteobacteria" "Gammaproteobacteria" "Pseudomonadales"
ASV66 "Bacteria" "Actinobacteria" "Actinobacteria" "Corynebacteriales"
ASV77 "Bacteria" "Firmicutes" "Bacilli" "Bacillales"
ASV81 "Bacteria" "Firmicutes" "Clostridia" "Clostridiales"
ASV82 "Bacteria" "Actinobacteria" "Actinobacteria" "Micrococcales"
ASV918 "Bacteria" "Chloroflexi" "Chloroflexia" "Thermomicrobiales"
ASV93 "Bacteria" "Firmicutes" "Bacilli" "Bacillales"
Family Genus Species
ASV1 "Paenibacillaceae" "Brevibacillus" "Unknown Brevibacillus"
ASV10 "Other Proteobacteria" "Other Proteobacteria" "Other Proteobacteria"
ASV100 "Other Firmicutes" "Other Firmicutes" "Other Firmicutes"
ASV10005 "Other" "Other" "Unknown Other"
ASV10010 "Other Bacteroidetes" "Other Bacteroidetes" "Other Bacteroidetes"
ASV10021 "Other Actinobacteria" "Other Actinobacteria" "Other Actinobacteria"
ASV11 "Burkholderiaceae" "Aquabacterium" "Unknown Aquabacterium"
ASV1298 "Other Chloroflexi" "Other Chloroflexi" "Other Chloroflexi"
ASV1318 "Caldilineaceae" "NA" "Unknown NA"
ASV133 "Spirosomaceae" "Flectobacillus" "Unknown Flectobacillus"
ASV17 "Bacillaceae" "Fictibacillus" "Unknown Fictibacillus"
ASV18 "ML635J-40_aquatic_group" "NA" "Unknown NA"
ASV1913 "Roseiflexaceae" "NA" "Unknown NA"
ASV2 "Moraxellaceae" "Acinetobacter" "Unknown Acinetobacter"
ASV21 "Chitinophagaceae" "Vibrionimonas" "Unknown Vibrionimonas"
ASV217 "Flavobacteriaceae" "Flavobacterium" "Unknown Flavobacterium"
ASV2371 "NA" "NA" "Unknown NA"
ASV24 "Bogoriellaceae" "Georgenia" "Unknown Georgenia"
ASV240 "Bacteroidaceae" "Bacteroides" "Unknown Bacteroides"
ASV28 "Propionibacteriaceae" "Cutibacterium" "Unknown Cutibacterium"
ASV3 "Burkholderiaceae" "Cupriavidus" "Unknown Cupriavidus"
ASV3126 "Chloroflexaceae" "Candidatus_Chloroploca" "Unknown Candidatus_Chloroploca" ASV32 "Nocardiaceae" "Gordonia" "Unknown Gordonia"
ASV4 "Sphingomonadaceae" "Sphingomonas" "Unknown Sphingomonas"
ASV6 "Pseudomonadaceae" "Pseudomonas" "Unknown Pseudomonas"
ASV66 "Mycobacteriaceae" "Mycobacterium" "Unknown Mycobacterium"
ASV77 "Bacillaceae" "Bacillus" "Unknown Bacillus"
ASV81 "Family_XI" "Soehngenia" "Unknown Soehngenia"
ASV82 "Microbacteriaceae" "Microbacterium" "Unknown Microbacterium"
ASV918 "JG30-KF-CM45" "NA" "Unknown NA"
ASV93 "Staphylococcaceae" "Staphylococcus" "Unknown Staphylococcus"

gmteunisse commented 4 months ago

Thanks for you issue. Looking at your code, I notice that you used na_label = 'Unknown', however; you need to make sure to add the <tax> and <rank> tags (you can omit rank if include_rank = F). Apologies for not making this clear in the documentation. If that doesn't resolve the issue, could you please provide a reproducible example with GlobalPatterns?

name_na_taxa(
    ps_obj,
    include_rank = F,
    na_label = "Unknown <tax>"
)
roopender-bioinfo commented 4 months ago

I did and it's not working the Unknown taxa is only assigning to NA in species column. ps_tmp <- name_na_taxa(top_nested$ps_obj, include_rank = T, na_label = "Unknown ()") tax_table(ps_tmp) %>% kable(format = "markdown")

However when I did this with Globalpatterns data it worked fine.

ps_tmp <- name_na_taxa(GlobalPatterns, include_rank = T, na_label = "Unknown ()") view(tax_table(ps_tmp))

details of my data

carbom phyloseq-class experiment-level object otu_table() OTU Table: [ 9845 taxa and 22 samples ] sample_data() Sample Data: [ 22 samples by 12 sample variables ] tax_table() Taxonomy Table: [ 9845 taxa by 7 taxonomic ranks ] top_nested <- nested_top_taxa(carbom,

  • top_tax_level = "Phylum",
  • nested_tax_level = "Genus",
  • n_top_taxa = 5,
  • n_nested_taxa = 5)
gmteunisse commented 4 months ago

Thanks for checking. If there is no issue with running the function on GlobalPatterns, then it means that the function is working as intended; however, your phyloseq object is not formatted as expected. The fact that you're getting "NA" and "Unknown NA" makes me think that there are no true NA values in your table, but rather that they are strings that say "NA" - could that be correct? Fantaxtic expects true NAs like in the image below. Can you check your phyloseq object, or provide a reproducible example?

image

PS When I run your code I get an error (see below), because you haven't added the <tax> tag in your na_label. Are you sure you are using fantaxtic::name_na_taxa()?

require("fantaxtic")
#> Loading required package: fantaxtic
require("magrittr")
#> Loading required package: magrittr
data(GlobalPatterns)
name_na_taxa(GlobalPatterns, na_label = "Unknown ()")
#> Error in name_na_taxa(GlobalPatterns, include_rank = F, na_label = "Unknown ()"): Error: include '<tax>' in the na_label

Created on 2024-07-10 by the reprex package (v2.0.1)

roopender-bioinfo commented 4 months ago

It worked! You were right it was "NA" string in my data that was causing the error. I transformed the NA to true NA values and It worked. Thanks buddy. The solution is here to transform you NA string values into real NA values.

Install and load necessary packages

install.packages("readxl") install.packages("dplyr") install.packages("writexl")

library(readxl) library(dplyr) library(writexl)

Load your Excel file into a data frame

file_path <- "path_to_your_excel_file.xlsx" df <- read_excel(file_path)

Replace 'NA' with actual NA values

df <- df %>% mutate(across(everything(), ~na_if(., "NA")))

View the updated data frame

head(df)

use df as tax_table for your phyloseq object and then use name_na_taxa, it will work