FrederickHuangLin / ANCOMBC

Differential abundance (DA) and correlation analyses for microbial absolute abundance data
https://www.nature.com/articles/s41467-020-17041-7
107 stars 29 forks source link

Analysis of ASV or fucntion table #106

Closed jylee324 closed 1 year ago

jylee324 commented 2 years ago

Hello,

I wonder if ancombc2() could analyze ASV or OTU-level abundance table and GENE or PATHWAY level abundance table (from shotgun metagenome analysis).

In my test running of ancombc2(), tax_level = NULL automatically agglomerates taxa to a species level. My input file is phyloseq object. Code:

out <- ancombc2(
  data = phyloseq, assay_name = "counts", tax_level = NULL, 
  fix_formula = "var1", rand_formula = NULL, p_adj_method = "BH", pseudo = 0, pseudo_sens = TRUE,
  prv_cut = 0.1, lib_cut = 1000, s0_perc = 0.05, group = "var1", struc_zero = TRUE, neg_lb = TRUE, alpha = 0.05,
  n_cl = 4, verbose = TRUE, global = FALSE, pairwise = FALSE, dunnet = FALSE, trend = FALSE,
  iter_control = list(tol = 0.01, max_iter = 20, verbose = FALSE),
  em_control = list(tol = 1e-05, max_iter = 100),
  mdfdr_control = list(fwer_ctrl_method = "holm", B = 100),
  trend_control = list(contrast = NULL, node = NULL, solver = "ECOS", B = 100)
)
print(out$feature_table %>% rownames() %>% head())

Return:

[1] "Unclassified_Bacteria"                   "Unclassified_Sutterella"                 "Unclassified_Parasutterella"            
[4] "Unclassified_Oxalobacteraceae"           "Unclassified_Pasteurellaceae"            "Unclassified_Clostridia_vadinBB60_group"

In order to analyze at ASV level, I added "ASV" column in tax_table of my phyloseq object.

> tax_df <- tax_table(phyloseq) %>% as.data.frame()
> tax_df$ASV <- rownames(tax_df)
> new_tax <- tax_df %>% as.matrix() %>% tax_table()
> new_phyloseq <- merge_phyloseq(phyloseq, new_tax)
> tax_table(new_phyloseq)
Taxonomy Table:     [ 7938 taxa by 8 taxonomic ranks ]:
                                    Kingdom  Phylum                Class                 Order   Family   Genus   Species   ASV     
                                    <chr>    <chr>                 <chr>                 <chr>   <chr>    <chr>   <chr>     <chr>   
 1 31d89f4ff44dde17bff5552dc3361c86 Bacteria Unclassified_Bacteria Unclassified_Bacteria Unclas… Unclass… Unclas… Unclassi… 31d89f4…
 2 1c4227d669fb9e85fe5b85967bc643f3 Bacteria Unclassified_Bacteria Unclassified_Bacteria Unclas… Unclass… Unclas… Unclassi… 1c4227d…
 3 71be1ab565f829874bde74467b9335d4 Bacteria Unclassified_Bacteria Unclassified_Bacteria Unclas… Unclass… Unclas… Unclassi… 71be1ab…
 4 89c931db0c4303ca053e3fbeaf35a39e Bacteria Unclassified_Bacteria Unclassified_Bacteria Unclas… Unclass… Unclas… Unclassi… 89c931d…
 5 a9237f048d480b1e2ef235a744d04c56 Bacteria Unclassified_Bacteria Unclassified_Bacteria Unclas… Unclass… Unclas… Unclassi… a9237f0…
 6 ba3b77c5dc02d02df002364342713f70 Bacteria Unclassified_Bacteria Unclassified_Bacteria Unclas… Unclass… Unclas… Unclassi… ba3b77c…
 7 6b34e244e0a361cf32d96d16b8ce2ac8 Bacteria Unclassified_Bacteria Unclassified_Bacteria Unclas… Unclass… Unclas… Unclassi… 6b34e24…
 8 4fc5ebaf466922400506c7b875a5d9f9 Bacteria Unclassified_Bacteria Unclassified_Bacteria Unclas… Unclass… Unclas… Unclassi… 4fc5eba…
 9 1f4df647f8d2958ccc88c545b7b3faa9 Bacteria Unclassified_Bacteria Unclassified_Bacteria Unclas… Unclass… Unclas… Unclassi… 1f4df64…
10 241a898ebdce1c26a843d789777ac908 Bacteria Unclassified_Bacteria Unclassified_Bacteria Unclas… Unclass… Unclas… Unclassi… 241a898…
# … with 7,928 more taxa

Then I run ancombc2 again with new_phyloseq, but the results were not changed (Taxa were still agglomerated to a species level).

I didn't test ancombc2 for gene or pathway abundance table yet. However, in my guess, ancombc2 could not deal with those types of abundance tables.

Do you have any plans to encompass those types of data (ASV, gene, pathway) in ancombc2?

Or, is there anything I missed?

Best,

FrederickHuangLin commented 2 years ago

Hi @jylee324,

Thanks for your feedback! I just updated the package to resolve the issue you posted.

For those data with no taxonomic classifications (ASV, gene, pathway), you can leave tax_level = NULL, and no agglomeration will be performed in this case.

The new features of ANCOMBC can be downloaded from its devel version. The release version will be available on BioC 3.16 in early November (BioC 3.16 has not yet been released).

Best, Huang

adamsorbie commented 2 years ago

Hey,

I'm still having the same issue in BioC 3.16 version?

Adam

FrederickHuangLin commented 1 year ago

Hi @adamsorbie,

I just tested ANCOMBC 2.1.1 (devel version, the same as ANCOMBC 2.0.1 in the release version) using simple simulated data. The codes are as follows:

library(ANCOMBC)
library(tidyverse)
library(microbiome)

set.seed(123)
otumat = matrix(sample(1:1000, 100, replace = TRUE), nrow = 10, ncol = 100)
rownames(otumat) = paste0("OTU", 1:nrow(otumat))
colnames(otumat) = paste0("Sample", 1:ncol(otumat))

taxmat = matrix(sample(letters, 80, replace = TRUE), nrow = nrow(otumat), ncol = 8)
rownames(taxmat) = rownames(otumat)
colnames(taxmat) = c("Domain", "Phylum", "Class", "Order", "Family", "Genus", "Species", "OTU")

sampledata = data.frame(Location = sample(LETTERS[1:4], size = 100, replace = TRUE),
                        Depth = sample(50:1000, size = 100, replace = TRUE),
                        stringsAsFactors = FALSE)
rownames(sampledata) = colnames(otumat)

OTU = otu_table(otumat, taxa_are_rows = TRUE)
META = sample_data(sampledata)
sample_names(sampledata) = rownames(sampledata)
TAX = tax_table(taxmat)
pseq = phyloseq(OTU, TAX, META)

tse = mia::makeTreeSummarizedExperimentFromPhyloseq(pseq)

set.seed(123)
output = ancombc2(data = tse, assay_name = "counts", tax_level = NULL,
                  fix_formula = "Location", rand_formula = NULL,
                  p_adj_method = "holm", pseudo = 0, pseudo_sens = FALSE,
                  prv_cut = 0.10, lib_cut = 1000, s0_perc = 0.05,
                  group = NULL, struc_zero = FALSE, neg_lb = FALSE,
                  alpha = 0.05, n_cl = 1, verbose = TRUE,
                  global = FALSE, pairwise = FALSE, dunnet = FALSE, trend = FALSE)

res_prim = output$res

It is able to show results at the OTU/ASV level. Which version of ANCOMBC are you using? Could you run the above example and see if there is any issue with it?

The session info is as below:

R version 4.2.1 (2022-06-23)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Ventura 13.0.1

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] microbiome_1.19.1 phyloseq_1.41.0   forcats_0.5.2     stringr_1.4.1     dplyr_1.0.10      purrr_0.3.5       readr_2.1.3      
 [8] tidyr_1.2.1       tibble_3.1.8      ggplot2_3.3.6     tidyverse_1.3.2   ANCOMBC_2.1.1    

loaded via a namespace (and not attached):
  [1] utf8_1.2.2                     tidyselect_1.2.0               lme4_1.1-30                    RSQLite_2.2.18                
  [5] htmlwidgets_1.5.4              grid_4.2.1                     BiocParallel_1.31.14           gmp_0.6-6                     
  [9] Rtsne_0.16                     munsell_0.5.0                  ScaledMatrix_1.5.1             codetools_0.2-18              
 [13] interp_1.1-3                   withr_2.5.0                    colorspace_2.0-3               Biobase_2.57.1                
 [17] energy_1.7-10                  knitr_1.40                     rstudioapi_0.14                stats4_4.2.1                  
 [21] SingleCellExperiment_1.19.1    DescTools_0.99.46              MatrixGenerics_1.9.1           Rdpack_2.4                    
 [25] emmeans_1.8.1-1                GenomeInfoDbData_1.2.9         bit64_4.0.5                    rhdf5_2.41.1                  
 [29] vctrs_0.5.0                    treeio_1.21.2                  generics_0.1.3                 xfun_0.34                     
 [33] R6_2.5.1                       doParallel_1.0.17              GenomeInfoDb_1.33.14           ggbeeswarm_0.6.0              
 [37] rsvd_1.0.5                     rhdf5filters_1.9.0             bitops_1.0-7                   cachem_1.0.6                  
 [41] DelayedArray_0.23.2            assertthat_0.2.1               scales_1.2.1                   nnet_7.3-18                   
 [45] googlesheets4_1.0.1            beeswarm_0.4.0                 rootSolve_1.8.2.3              gtable_0.3.1                  
 [49] beachmat_2.13.4                lmom_2.9                       rlang_1.0.6                    splines_4.2.1                 
 [53] lazyeval_0.2.2                 gargle_1.2.1                   broom_1.0.1                    checkmate_2.1.0               
 [57] reshape2_1.4.4                 modelr_0.1.9                   backports_1.4.1                Hmisc_4.7-1                   
 [61] tools_4.2.1                    ellipsis_0.3.2                 decontam_1.17.0                biomformat_1.25.0             
 [65] RColorBrewer_1.1-3             proxy_0.4-27                   BiocGenerics_0.43.4            MultiAssayExperiment_1.23.11  
 [69] Rcpp_1.0.9                     plyr_1.8.7                     base64enc_0.1-3                sparseMatrixStats_1.9.0       
 [73] zlibbioc_1.43.0                RCurl_1.98-1.9                 rpart_4.1.19                   TreeSummarizedExperiment_2.5.0
 [77] deldir_1.0-6                   viridis_0.6.2                  S4Vectors_0.35.4               SummarizedExperiment_1.27.3   
 [81] haven_2.5.1                    ggrepel_0.9.1                  cluster_2.1.4                  fs_1.5.2                      
 [85] DECIPHER_2.25.4                magrittr_2.0.3                 data.table_1.14.4              lmerTest_3.1-3                
 [89] reprex_2.0.2                   googledrive_2.0.0              mvtnorm_1.1-3                  matrixStats_0.62.0            
 [93] gsl_2.1-7.1                    hms_1.1.2                      xtable_1.8-4                   jpeg_0.1-9                    
 [97] readxl_1.4.1                   IRanges_2.31.2                 gridExtra_2.3                  compiler_4.2.1                
[101] scater_1.25.7                  crayon_1.5.2                   minqa_1.2.5                    htmltools_0.5.3               
[105] mgcv_1.8-41                    tzdb_0.3.0                     Formula_1.2-4                  expm_0.999-6                  
[109] Exact_3.2                      lubridate_1.8.0                DBI_1.1.3                      dbplyr_2.2.1                  
[113] MASS_7.3-58.1                  boot_1.3-28                    ade4_1.7-19                    Matrix_1.5-1                  
[117] permute_0.9-7                  cli_3.4.1                      rbibutils_2.2.9                igraph_1.3.5                  
[121] parallel_4.2.1                 GenomicRanges_1.49.1           pkgconfig_2.0.3                numDeriv_2016.8-1.1           
[125] foreign_0.8-83                 scuttle_1.7.4                  xml2_1.3.3                     foreach_1.5.2                 
[129] vipor_0.4.5                    DirichletMultinomial_1.39.0    rngtools_1.5.2                 multtest_2.53.0               
[133] XVector_0.37.1                 estimability_1.4.1             mia_1.5.17                     rvest_1.0.3                   
[137] CVXR_1.0-10                    doRNG_1.8.2                    yulab.utils_0.0.5              digest_0.6.30                 
[141] vegan_2.6-4                    Biostrings_2.65.6              cellranger_1.1.0               tidytree_0.4.1                
[145] htmlTable_2.4.1                gld_2.6.5                      DelayedMatrixStats_1.19.2      nloptr_2.0.3                  
[149] lifecycle_1.0.3                nlme_3.1-160                   jsonlite_1.8.3                 Rhdf5lib_1.19.2               
[153] BiocNeighbors_1.15.1           viridisLite_0.4.1              fansi_1.0.3                    pillar_1.8.1                  
[157] lattice_0.20-45                fastmap_1.1.0                  httr_1.4.4                     survival_3.4-0                
[161] glue_1.6.2                     png_0.1-7                      iterators_1.0.14               bit_4.0.4                     
[165] class_7.3-20                   stringi_1.7.8                  blob_1.2.3                     BiocSingular_1.13.1           
[169] latticeExtra_0.6-30            memoise_2.0.1                  Rmpfr_0.8-9                    irlba_2.3.5.1  

Best, Huang

adamsorbie commented 1 year ago

Hi, I'm no longer having this problem thanks! I think the update to BioConductor 3.16 wasn't successfull at first which I guess was the reason for the error.