david-barnett / microViz

R package for microbiome data visualization and statistics. Uses phyloseq, vegan and the tidyverse. Docker image available.
https://david-barnett.github.io/microViz/
GNU General Public License v3.0
106 stars 11 forks source link

dominant taxa in each sample #141

Closed momodiame21 closed 5 months ago

momodiame21 commented 8 months ago

I have two issues.

1) on the figure attached. some samples clearly have a dominant genera but it is labelled 'other' on the bar. e.g the last sample on the figure is dominated by Pseudomonas but it is labelled 'other' 2)There seems to be 2 samples in which 2 genera have at least 30% abundance threshold. (Streptococcus, Haemophilus). One labelled Haemophilus and the other Streptococcus. Is it possible to name such samples as e.g Streptococcus-Haemophilus dominant

ps <- ps_calc_dominant( ps = POb_bac_filtered, rank = "Genus", threshold = 0.30, var = "dominant_genus", n_max = 17, none = "diverse", other = "other" )

with(sample_data(ps), table(dominant_genus)) dominant_genus Actinobacillus Alloprevotella Brachybacterium Cloacibacterium 1 4 1 1 Corynebacterium diverse Fusobacterium Haemophilus 2 697 6 4 Lautropia Moraxella Neisseria Pasteurellaceae Genus 1 15 5 1 Porphyromonas Prevotella Pseudomonas Staphylococcus 7 9 9 5 Streptobacillus Streptococcus 1 29 ps_subset <- subset_samples(ps, dominant_genus != "diverse")

ps_subset %>% ps_filter(Spn == "Pos") %>% ps_calc_dominant(rank = "Genus") %>% comp_barplot(tax_level = "Genus", label = "dominant_Genus", n_taxa = 17) + coord_flip()

![image](https://github.com/david-barnett/microViz/assets/162585823/e9388639-8db3-4e82-bc7a-bc![plot_zoom](https://github.com/david-barnett/microViz/assets/162585823/816d05cb-6057-4cdf-88dc-7bd0389c15b4)

david-barnett commented 8 months ago

image

1) I think the image shows expected behaviour of the ps_calc_dominant function, the taxa that are labelled are the ones that most frequently dominate the samples in your dataset. "other" means that the sample is dominated by a taxon, but as Pseudomonas dominance only occurs twice, it isn't labelled as such. To achieve specific labelling for more/all taxa, you could keep raising the n_max argument

2) There's no functionality specifically to consider whether a given pair are above a threshold, you'd have to do something like that manually, e.g.

library(dplyr)

threshold <- 0.3

ps %>% 
  tax_transform("compositional", rank = "Genus") %>% 
  otu_get() %>% 
  as.data.frame() %>% 
  as_tibble() %>% 
  mutate(Streptoccocus_and_Haemophilus = Streptococcus + Haemophilus) %>%
  mutate(S_and_H_dominate_together = if_else(
    condition = Strep_and_Haemo > threshold & Streptococcus <= threshold & Haemophilus <= threshold,
    true = TRUE, false = FALSE
  )) %>% 
  select(S_and_H_dominate_together, Strep_and_Haemo, Streptococcus, Haemophilus)
david-barnett commented 8 months ago

Oops I did not read your second question correctly

There seems to be 2 samples in which 2 genera have at least 30% abundance threshold

but still, you would have to do this manually, there's no built in functionality for this