gmteunisse / fantaxtic

Fantaxtic - Nested Bar Plots for Phyloseq Data
26 stars 3 forks source link

2-level relative abundance bar plot of microbial metagenomics data: help with code #12

Closed NaomiVi closed 1 year ago

NaomiVi commented 1 year ago

Hi,

I have been trying to visual my microbial metagenomics data (i.e., phyloseq objects) in a relative abundance bar plot using Phylum as top rank and Class as nested rank, as well as doing the same with a relative abundance bar plot using Functional information as top rank and Class as nested rank, for each sampling depth (samples). The fantaxtic R package seems to be perfect for that, but when I plot the data after applying the function _nested_toptaxa() I get this error message:

_Error in h(simpleError(msg, call)) : error in evaluating the argument 'object' in selecting a method for function 'taxtable': subscript out of bounds

I also get the following error message when using the function _plot_nestedbar() :

_Error in nested_palette(data, group, subgroup, gradient_type, min_l, maxl, : Error: 6 values required in clr.pal, 5 provided.

This keeps happening regardless of the number of n_top_taxa or n_nested_taxa I choose in the _nested_toptax() function.

Could I ask for any recommendations/ insights you may have please?

This is my code and data at the moment:

ps_obj1 #my phyloseq object

phyloseq-class experiment-level object otu_table() OTU Table: [ 2163 taxa and 8 samples ] sample_data() Sample Data: [ 8 samples by 8 sample variables ] tax_table() Taxonomy Table: [ 2163 taxa by 6 taxonomic ranks ] #taxonomic ranks = "t"= trophy; "k" = kingdom; "p"= phylum; "c" = Class; "o"= order; "f"= family

tax_table(ps_obj1) #visualise tax_table()

Taxonomy Table: [2163 taxa by 6 taxonomic ranks]: t k p c
OTU_1 "phagotrophs" "Eukaryota" "Stramenopiles_X" "Bicoecea"
o f
OTU_1 "Anoecales" "Caecitellaceae"

nov_top_nested1 <- nested_top_taxa(ps_obj1, #Phylum as top rank, Class as nested rank top_tax_level = "p" , nested_tax_level = "c", n_top_taxa= 7, n_nested_taxa = 3)

tax_nov_barplot1 <- plot_nested_bar(ps_obj = nov_top_nested1$ps_obj, top_level = "p", nested_level = "c")

_Error in nested_palette(data, group, subgroup, gradient_type, min_l, maxl, : Error: 7 values required in clr.pal, 6 provided.

nov_top_nested2 <- nested_top_taxa(ps_obj1, #Trophy as top rank, Class as nested rank top_tax_level = "t" , nested_tax_level = "c", n_top_taxa = 5, n_nested_tax_level= 3)

_Error in h(simpleError(msg, call)) : error in evaluating the argument 'object' in selecting a method for function 'taxtable': subscript out of bounds

troph_oct_barplot1 <- plot_nested_bar(ps_obj = nov_top_nested2$ps_obj, top_level = "t", nested_level = "c")

Thank you in advance for your help.

Best wishes, Naomi

gmteunisse commented 1 year ago

Hi Naomi, thanks for giving fantaxtic a try. Sounds like something is going wrong indeed, perhaps we can figure out together where the problem originates.

Your code examples seem fine (although in your nov_top_nested2 example you specify the non-existing argument n_nested_tax_level = 3), so I think the problem must originate some place else.

Could you let me know whether you get the same errors when you follow the example from the documentation that uses GlobalPatterns? If so, there might be a dependency issue.

If not, then it could be that there is a mismatch between my code and your data. In order the check where the error then originates, I would need a minimal dataset with which I can reproduce the error. Would you be able to provide that for me?

NaomiVi commented 1 year ago

Thank you very much for the quick reply.

I have ran the following code from the documentation and did not get the same errors:

#Test

data(GlobalPatterns) top_asv <- top_taxa(GlobalPatterns, n_taxa = 10) plot_nested_bar(ps_obj = top_asv$ps_obj, top_level = "Phylum", nested_level = "Species") top_nested <- nested_top_taxa(GlobalPatterns, top_tax_level = "Phylum", nested_tax_level = "Species", n_top_taxa = 3, n_nested_taxa = 3) plot_nested_bar(ps_obj = top_nested$ps_obj, top_level = "Phylum", nested_level = "Species")

I have also tried to run the script with my data without the nested_top_taxa() function and I do not get any errors.

oct_top_nested <- top_taxa(ps_obj1, n_taxa = 10) tax_oct_barplot1 <- plot_nested_bar(ps_obj = oct_top_nested$ps_obj, top_level = "t", nested_level = "c")

I have attached a subset of my data (ASV table + sample metadata + taxonomy table) to check where the mismatch could come from. OTU_taxonomy.csv Subset_data.csv sample_metadata.csv

Here is the code I run to convert my data to a phyloseq object:

otu_data <- read_csv("subset_data.csv") # OTU table otu_data <-otu_data [!(otu_data$Trophy=="parasites"),] # if 'parasites' remove row otu_data <-otu_data [!(otu_data$Trophy=="saprotrophs"),] # if 'saprotrophs' remove row tax_data <- read_csv("OTU_taxonomy.csv") sample_data <- read_csv("sample_metadata2.csv", col_types = "ccccccccc") # each "c" means a column of "character" tax_data$OTU_ID <- as.character(tax_data$OTU_ID) # must be same type for join to work otu_data$OTU_ID <- as.character(otu_data$OTU_ID) # must be same type for join to work otu_data <- left_join(otu_data, tax_data, by = c("OTU_ID" = "OTU_ID")) # identifies cols with shared IDs obj <- parse_tax_data(otu_data, class_cols = "Taxonomy.x", class_sep = ";", classregex = "^([a-z]{0,1}){0,2}(.*)$", class_key = c("tax_rank" = "taxon_rank", "name" = "taxon_name")) obj$data$class_data <- NULL # get rid of "class_data" names(obj$data) <- "otu_counts" # rename the "tax_data" obj$data$otu_counts <- obj$data$otu_counts[c("taxon_id", "OTU_ID", sample_data$Sample_ID)] obj$data$otu_counts <- zero_low_counts(obj, "otu_counts", min_count = 10, # remove OTUs for which reads < 10 other_cols = TRUE) # keep OTU_ID column no_reads <- rowSums(obj$data$otu_counts[, sample_data$Sample_ID]) == 0 sum(no_reads) # when sum is used on a TRUE/FALSE vector it counts TRUEs obj <- filter_obs(obj, "otu_counts", ! no_reads, drop_taxa = TRUE) set.seed(8765) # set the random seed for reproducibility ps_obj1 <- as_phyloseq(obj, # alpha diversity indexes per sample otu_table = "otu_counts", otu_id_col = "OTU_ID", sample_data = sample_data, sample_id_col = "Sample_ID")

Thanks again for you help with that, let me know if you need anything else.

gmteunisse commented 1 year ago

Thanks for checking that, and for narrowing down in which function it occurs.

I have tried running your code, but get many errors; I think the files you added do not match your code. For example, subset_data.csv need to be filtered on Trophy in your second line of code, but it contains no column called Trophy. Can you either provide an .rds file with the filtered phyloseq object in it, or provide me with the correct code to turn your tables into phyloseq objects?

NaomiVi commented 1 year ago

Apologise for this.

Here is the phyloseq object (.rds): https://drive.google.com/file/d/1Btvv1AdnbtT9MoCMoCWAYWGFnNGSRgCG/view?usp=sharing

Thanks again for your help.

gmteunisse commented 1 year ago

Thanks Naomi. I've managed to trace down the errors and resolve them.

Some of the errors could be traced back to how I coded these functions, which I've fixed in commit https://github.com/gmteunisse/fantaxtic/commit/a058f8f6de1a40a8796f85ee7c6a58cc64507484. Please reinstall the package to get the update.

However, some of the errors were related to your taxonomy table. The t rank stores non-taxonomic data, and unfortunately it was placed at the start of the table. Because fantaxtic assumes that data is nested, this leads to problems. My suggestion is to either remove the column completely, or if you want to somehow incorporate it in the visualization, to add it to the lowest taxonomic rank of the visualization. See the code below for examples.

Let me know if that resolves your issue.

require("phyloseq")
#> Loading required package: phyloseq
require("tidyverse")
#> Loading required package: tidyverse
require("fantaxtic")
#> Loading required package: fantaxtic

# Read in the data
ps <- readRDS("psobj.rds")
ps1 <- ps
ps2 <- ps

# Remove the t column
remove_ps_rank <- function(ps_obj, rank){
  tax_tab <- tax_table(ps1)
  i <- which(colnames(tax_tab) == rank)
  tax_tab <- tax_tab[,-i]
  tax_table(ps_obj) <- tax_table(tax_tab)
  return(ps_obj)
}

# Get top taxa and plot
ps1 <- remove_ps_rank(ps1, "t")
top <- nested_top_taxa(ps_obj = ps1, top_tax_level = "p", nested_tax_level = "c", n_top_taxa = 7, n_nested_taxa = 3)
plot_nested_bar(top$ps_obj, top_level = "p", nested_level = "c")

# Add column t information to column c
merge_tax_ranks <- function(ps_obj, col_to_keep, col_to_remove, label = "<keep> (<remove>)"){
  tax_tab <- tax_table(ps_obj) %>% as.data.frame()
  tax_tab <- tax_tab %>%
    mutate(!!col_to_keep := label %>%
             str_replace("<keep>", !!sym(col_to_keep)) %>%
             str_replace("<remove>", !!sym(col_to_remove))) %>%
    select(!sym(!!col_to_remove))
  tax_table(ps_obj) <- tax_table(as.matrix(tax_tab))
  return(ps_obj)
}

# Get to taxa and plot
ps2 <- merge_tax_ranks(ps2, "c", "t")
top <- nested_top_taxa(ps_obj = ps2, top_tax_level = "p", nested_tax_level = "c", n_top_taxa = 7, n_nested_taxa = 3)
plot_nested_bar(top$ps_obj, top_level = "p", nested_level = "c")

Created on 2022-10-12 by the reprex package (v2.0.1)

NaomiVi commented 1 year ago

Hi,

I have run my data using your recommendations and managed to finally get the figures I wanted. Thank you very much for helping ! The barplots are for a paper revision so I will make sure to add the 'fantaxtic' package to the method section of the revised manuscript.

Thanks again, Naomi

gmteunisse commented 1 year ago

Glad it worked, and happy to help! Good luck with the revisions.