david-barnett / microViz

R package for microbiome data visualization and statistics. Uses phyloseq, vegan and the tidyverse. Docker image available.
https://david-barnett.github.io/microViz/
GNU General Public License v3.0
106 stars 11 forks source link

The `tax_filter` argument `min_sample_abundance` does not recognise proportions #128

Closed hkaspersen closed 2 months ago

hkaspersen commented 1 year ago

Hello, and thank you for an excellent R package! I have a dataset and I want to prune taxa that has a relative abundance of less than 0.1% on the genera level (Rank6).

physeq_obj %>%
    tax_filter(min_sample_abundance = 0.001, verbose = FALSE, tax_level = "Rank6")

However when I do this, the physeq object is exactly identical after filtering. What am I doing wrong here? The input data is untransformed counts.

physeq object:

phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 10292 taxa and 44 samples ]
sample_data() Sample Data:       [ 44 samples by 9 sample variables ]
tax_table()   Taxonomy Table:    [ 10292 taxa by 7 taxonomic ranks ]

microViz package version 0.10.10 R version 4.3.0

david-barnett commented 1 year ago

Hi Hakon

thanks for reporting, as this is a bug!

The min_sample_abundance argument is not treating the 0.001 as a proportion. ~I'll fix this in the next version, but~ for now, here is a workaround that hopefully meets your needs.

library(microViz)
#> microViz version 0.11.0 - Copyright (C) 2023 David Barnett
#> ! Website: https://david-barnett.github.io/microViz
#> ✔ Useful?  For citation details, run: `citation("microViz")`
#> ✖ Silence? `suppressPackageStartupMessages(library(microViz))`

# built-in example phyloseq data
data("shao19")

# Bug, this is expected to treat values between 0 and 1 as a proportion of counts in each sample
# but it does not do that conversion. Instead, all non-absent taxa pass the threshold.
shao19 %>% tax_filter(min_sample_abundance = 0.01, tax_level = "genus") 
#> phyloseq-class experiment-level object
#> otu_table()   OTU Table:         [ 819 taxa and 1644 samples ]
#> sample_data() Sample Data:       [ 1644 samples by 11 sample variables ]
#> tax_table()   Taxonomy Table:    [ 819 taxa by 6 taxonomic ranks ]
#> phy_tree()    Phylogenetic Tree: [ 819 tips and 818 internal nodes ]

# Workaround, transform to compositional then filter 
# next, retrieve stored counts (if you don't want to continue with proportions)
shao19 %>% 
  tax_transform("compositional") %>% 
  tax_filter(
    min_sample_abundance = 0.01, tax_level = "genus", use_counts = FALSE,
    prev_detection_threshold = 0 # default of 1 expects counts
  ) %>% 
  ps_get(counts = TRUE)
#> phyloseq-class experiment-level object
#> otu_table()   OTU Table:         [ 651 taxa and 1644 samples ]
#> sample_data() Sample Data:       [ 1644 samples by 11 sample variables ]
#> tax_table()   Taxonomy Table:    [ 651 taxa by 7 taxonomic ranks ]
#> phy_tree()    Phylogenetic Tree: [ 651 tips and 650 internal nodes ]

Created on 2023-10-16 with reprex v2.0.2

david-barnett commented 10 months ago

I am changing the documentation in microViz version 0.12.0 to remove the (erroneous) suggestion that min_sample_abundance could handle proportions. For now, I don't have time to actually add this feature.

david-barnett commented 2 months ago

for now i'll close this,

I hope to rebuild a better tax_filter in future projects, but for now this is no longer a bug as the current docs no longer erroneously indicate that min_sample_abundance can handle proportions