Biomarker discovery explanation and new package's feature discussion

YuLab-SMU / MicrobiotaProcess

:microbe: A comprehensive R package for deep mining microbiome

https://www.sciencedirect.com/science/article/pii/S2666675823000164

175 stars 37 forks source link

Biomarker discovery explanation and new package's feature discussion #35

Open alienzj opened 2 years ago

alienzj commented 2 years ago

Dear MPSE developer,

Recently I try to use MicrobiotaProcess to do microbiome data analysis. This is a very nice R package as far as I know.

So, I installed it and learn it based on the documentation.

I encountered two problem in reproducing the biomarker analysis.

and:

Any help would be greatly appreciated.

alienzj commented 2 years ago

─ Session info ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 setting  value                       
 version  R version 4.1.1 (2021-08-10)
 os       Arch Linux                  
 system   x86_64, linux-gnu           
 ui       RStudio                     
 language (EN)                        
 collate  en_US.UTF-8                 
 ctype    en_US.UTF-8                 
 tz       Asia/Hong_Kong              
 date     2021-10-12                  

─ Packages ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
 package              * version  date       lib source        
 ade4                   1.7-18   2021-09-16 [1] CRAN (R 4.1.1)
 ape                    5.5      2021-04-25 [1] CRAN (R 4.1.1)
 aplot                  0.1.1    2021-09-22 [1] CRAN (R 4.1.1)
 assertthat             0.2.1    2019-03-21 [1] CRAN (R 4.1.1)
 Biobase                2.52.0   2021-05-19 [1] Bioconductor  
 BiocGenerics           0.38.0   2021-05-19 [1] Bioconductor  
 BiocManager            1.30.16  2021-06-15 [1] CRAN (R 4.1.1)
 biomformat             1.20.0   2021-05-19 [1] Bioconductor  
 Biostrings             2.60.2   2021-08-05 [1] Bioconductor  
 bitops                 1.0-7    2021-04-24 [1] CRAN (R 4.1.1)
 bslib                  0.3.1    2021-10-06 [1] CRAN (R 4.1.1)
 cli                    3.0.1    2021-07-17 [1] CRAN (R 4.1.1)
 cluster                2.1.2    2021-04-17 [2] CRAN (R 4.1.1)
 codetools              0.2-18   2020-11-04 [2] CRAN (R 4.1.1)
 coin                   1.4-2    2021-10-08 [1] CRAN (R 4.1.1)
 colorspace             2.0-2    2021-06-24 [1] CRAN (R 4.1.1)
 corrr                  0.4.3    2020-11-24 [1] CRAN (R 4.1.1)
 crayon                 1.4.1    2021-02-08 [1] CRAN (R 4.1.1)
 data.table             1.14.2   2021-09-27 [1] CRAN (R 4.1.1)
 DBI                    1.1.1    2021-01-15 [1] CRAN (R 4.1.1)
 DelayedArray           0.18.0   2021-05-19 [1] Bioconductor  
 digest                 0.6.28   2021-09-23 [1] CRAN (R 4.1.1)
 dplyr                  1.0.7    2021-06-18 [1] CRAN (R 4.1.1)
 ellipsis               0.3.2    2021-04-29 [1] CRAN (R 4.1.1)
 evaluate               0.14     2019-05-28 [1] CRAN (R 4.1.1)
 fansi                  0.5.0    2021-05-25 [1] CRAN (R 4.1.1)
 farver                 2.1.0    2021-02-28 [1] CRAN (R 4.1.1)
 fastmap                1.1.0    2021-01-25 [1] CRAN (R 4.1.1)
 forcats              * 0.5.1    2021-01-27 [1] CRAN (R 4.1.1)
 foreach                1.5.1    2020-10-15 [1] CRAN (R 4.1.1)
 generics               0.1.0    2020-10-31 [1] CRAN (R 4.1.1)
 GenomeInfoDb           1.28.4   2021-09-05 [1] Bioconductor  
 GenomeInfoDbData       1.2.6    2021-08-12 [1] Bioconductor  
 GenomicRanges          1.44.0   2021-05-19 [1] Bioconductor  
 ggalluvial             0.12.3   2020-12-05 [1] CRAN (R 4.1.1)
 ggfun                  0.0.4    2021-09-17 [1] CRAN (R 4.1.1)
 gghalves               0.1.1    2020-11-08 [1] CRAN (R 4.1.1)
 ggnewscale             0.4.5    2021-01-11 [1] CRAN (R 4.1.1)
 ggplot2              * 3.3.5    2021-06-25 [1] CRAN (R 4.1.1)
 ggplotify              0.1.0    2021-09-02 [1] CRAN (R 4.1.1)
 ggrepel                0.9.1    2021-01-15 [1] CRAN (R 4.1.1)
 ggside                 0.1.2    2021-07-21 [1] CRAN (R 4.1.1)
 ggsignif               0.6.3    2021-09-09 [1] CRAN (R 4.1.1)
 ggstar               * 1.0.2    2021-04-07 [1] CRAN (R 4.1.1)
 ggtree               * 3.0.4    2021-08-22 [1] Bioconductor  
 ggtreeExtra          * 1.2.3    2021-09-12 [1] Bioconductor  
 glue                   1.4.2    2020-08-27 [1] CRAN (R 4.1.1)
 gridExtra              2.3      2017-09-09 [1] CRAN (R 4.1.1)
 gridGraphics           0.5-1    2020-12-13 [1] CRAN (R 4.1.1)
 gtable                 0.3.0    2019-03-25 [1] CRAN (R 4.1.1)
 htmltools              0.5.2    2021-08-25 [1] CRAN (R 4.1.1)
 igraph                 1.2.6    2020-10-06 [1] CRAN (R 4.1.1)
 IRanges                2.26.0   2021-05-19 [1] Bioconductor  
 iterators              1.0.13   2020-10-15 [1] CRAN (R 4.1.1)
 jquerylib              0.1.4    2021-04-26 [1] CRAN (R 4.1.1)
 jsonlite               1.7.2    2020-12-09 [1] CRAN (R 4.1.1)
 knitr                  1.36     2021-09-29 [1] CRAN (R 4.1.1)
 labeling               0.4.2    2020-10-20 [1] CRAN (R 4.1.1)
 lattice                0.20-44  2021-05-02 [2] CRAN (R 4.1.1)
 lazyeval               0.2.2    2019-03-15 [1] CRAN (R 4.1.1)
 libcoin                1.0-9    2021-09-27 [1] CRAN (R 4.1.1)
 lifecycle              1.0.1    2021-09-24 [1] CRAN (R 4.1.1)
 magrittr               2.0.1    2020-11-17 [1] CRAN (R 4.1.1)
 MASS                   7.3-54   2021-05-03 [2] CRAN (R 4.1.1)
 Matrix                 1.3-4    2021-06-01 [2] CRAN (R 4.1.1)
 MatrixGenerics         1.4.3    2021-08-26 [1] Bioconductor  
 matrixStats            0.61.0   2021-09-17 [1] CRAN (R 4.1.1)
 mgcv                   1.8-36   2021-06-01 [2] CRAN (R 4.1.1)
 MicrobiotaProcess    * 1.4.4    2021-10-03 [1] Bioconductor  
 modeltools             0.2-23   2020-03-05 [1] CRAN (R 4.1.1)
 multcomp               1.4-17   2021-04-29 [1] CRAN (R 4.1.1)
 multtest               2.48.0   2021-05-19 [1] Bioconductor  
 munsell                0.5.0    2018-06-12 [1] CRAN (R 4.1.1)
 mvtnorm                1.1-3    2021-10-08 [1] CRAN (R 4.1.1)
 nlme                   3.1-152  2021-02-04 [2] CRAN (R 4.1.1)
 patchwork              1.1.1    2020-12-17 [1] CRAN (R 4.1.1)
 permute                0.9-5    2019-03-12 [1] CRAN (R 4.1.1)
 phyloseq               1.36.0   2021-05-19 [1] Bioconductor  
 pillar                 1.6.3    2021-09-26 [1] CRAN (R 4.1.1)
 pkgconfig              2.0.3    2019-09-22 [1] CRAN (R 4.1.1)
 plyr                   1.8.6    2020-03-03 [1] CRAN (R 4.1.1)
 purrr                  0.3.4    2020-04-17 [1] CRAN (R 4.1.1)
 R6                     2.5.1    2021-08-19 [1] CRAN (R 4.1.1)
 Rcpp                   1.0.7    2021-07-07 [1] CRAN (R 4.1.1)
 RCurl                  1.98-1.5 2021-09-17 [1] CRAN (R 4.1.1)
 reshape2               1.4.4    2020-04-09 [1] CRAN (R 4.1.1)
 rhdf5                  2.36.0   2021-05-19 [1] Bioconductor  
 rhdf5filters           1.4.0    2021-05-19 [1] Bioconductor  
 Rhdf5lib               1.14.2   2021-07-06 [1] Bioconductor  
 rlang                  0.4.11   2021-04-30 [1] CRAN (R 4.1.1)
 rmarkdown              2.11     2021-09-14 [1] CRAN (R 4.1.1)
 rstudioapi             0.13     2020-11-12 [1] CRAN (R 4.1.1)
 S4Vectors              0.30.2   2021-10-03 [1] Bioconductor  
 sandwich               3.0-1    2021-05-18 [1] CRAN (R 4.1.1)
 sass                   0.4.0    2021-05-12 [1] CRAN (R 4.1.1)
 scales                 1.1.1    2020-05-11 [1] CRAN (R 4.1.1)
 sessioninfo            1.1.1    2018-11-05 [1] CRAN (R 4.1.1)
 stringi                1.7.5    2021-10-04 [1] CRAN (R 4.1.1)
 stringr                1.4.0    2019-02-10 [1] CRAN (R 4.1.1)
 SummarizedExperiment   1.22.0   2021-05-19 [1] Bioconductor  
 survival               3.2-11   2021-04-26 [2] CRAN (R 4.1.1)
 TH.data                1.1-0    2021-09-27 [1] CRAN (R 4.1.1)
 tibble                 3.1.5    2021-09-30 [1] CRAN (R 4.1.1)
 tidyr                  1.1.4    2021-09-27 [1] CRAN (R 4.1.1)
 tidyselect             1.1.1    2021-04-30 [1] CRAN (R 4.1.1)
 tidytree             * 0.3.5    2021-09-08 [1] CRAN (R 4.1.1)
 treeio                 1.16.2   2021-08-17 [1] Bioconductor  
 utf8                   1.2.2    2021-07-24 [1] CRAN (R 4.1.1)
 vctrs                  0.3.8    2021-04-29 [1] CRAN (R 4.1.1)
 vegan                  2.5-7    2020-11-28 [1] CRAN (R 4.1.1)
 viridisLite            0.4.0    2021-04-13 [1] CRAN (R 4.1.1)
 withr                  2.4.2    2021-04-18 [1] CRAN (R 4.1.1)
 xfun                   0.26     2021-09-14 [1] CRAN (R 4.1.1)
 XVector                0.32.0   2021-05-19 [1] Bioconductor  
 yaml                   2.2.1    2020-02-01 [1] CRAN (R 4.1.1)
 yulab.utils            0.0.4    2021-10-09 [1] CRAN (R 4.1.1)
 zlibbioc               1.38.0   2021-05-19 [1] Bioconductor  
 zoo                    1.8-9    2021-03-09 [1] CRAN (R 4.1.1)

[1] /home/zhujie/.R
[2] /usr/lib/R/library

xiangpin commented 2 years ago

This is because the geom_hilight of ggtree( released version) does not support the function for data parameter. This feature is supported by ggtree >= 3.1.3. But, you can use subset=nodeClass == Phylum in aes to plot it in here.

tp2 <- tp1 +
      geom_hilight(
        #data = td_filter(nodeClass == "Phylum"), # This is only supported by ggtree >=3.1.3.
        mapping = aes(subset = nodeClass == "Phylum", 
                      node = node, 
                      fill = label)
      )

The document you refer to is the development version (Bioconductor 3.14), you can refer to the document since you are using the released version (Bioconductor 3.13)

alienzj commented 2 years ago

@xiangpin Thank you for your quick reply. Nice!

But I meet the second problem.

Error: Can't subset columns that don't exist. x Column `RareAbundanceBySample` doesn't exist. Run `rlang::last_error()` to see where the error occurred.

At the same time, how can I use the usage in the document on the metaphlan3 profile?

Many thanks to your help !

xiangpin commented 2 years ago

Please check column names of associated data of the treedata have the RareAbundanceBySample.

> mouse.time.mpse %>% 
    mp_rrarefy() %>% 
    mp_cal_abundance %>% 
    mp_diff_analysis(
             .abundance = RelRareAbundanceBySample, 
             .group = time, 
             first.test.alpha = 0.01) %>% 
    mp_extract_tree("taxatree")
The otutree is empty in the MPSE object!
The RareAbundance was in the MPSE object, please check whether it has been rarefied !
The otutree is empty in the MPSE object!
'treedata' S4 object'.

...@ phylo:

Phylogenetic tree with 232 tips and 218 internal nodes.

Tip labels:
  OTU_58, OTU_67, OTU_231, OTU_188, OTU_150, OTU_207, ...
Node labels:
  r__root, k__Bacteria, p__Actinobacteria, p__Bacteroidetes, p__Cyanobacteria, p__Deinococcus-Thermus, ...

Rooted; no branch lengths.

with the following features available:
        'nodeClass',    'nodeDepth',    'RareAbundanceBySample',        'LDAupper',     'LDAmean',
        'LDAlower',     'Sign_time',    'pvalue',       'fdr'.

# The associated data tibble abstraction: 450 × 12
# The 'node', 'label' and 'isTip' are from the phylo tree.
    node label   isTip nodeClass nodeDepth RareAbundanceBySamp… LDAupper LDAmean
   <dbl> <chr>   <lgl> <chr>         <dbl> <list>                  <dbl>   <dbl>
 1     1 OTU_58  TRUE  OTU               8 <tibble [19 × 4]>          NA      NA
 2     2 OTU_67  TRUE  OTU               8 <tibble [19 × 4]>          NA      NA
 3     3 OTU_231 TRUE  OTU               8 <tibble [19 × 4]>          NA      NA
 4     4 OTU_188 TRUE  OTU               8 <tibble [19 × 4]>          NA      NA
 5     5 OTU_150 TRUE  OTU               8 <tibble [19 × 4]>          NA      NA
 6     6 OTU_207 TRUE  OTU               8 <tibble [19 × 4]>          NA      NA
 7     7 OTU_5   TRUE  OTU               8 <tibble [19 × 4]>          NA      NA
 8     8 OTU_80  TRUE  OTU               8 <tibble [19 × 4]>          NA      NA
 9     9 OTU_163 TRUE  OTU               8 <tibble [19 × 4]>          NA      NA
10    10 OTU_1   TRUE  OTU               8 <tibble [19 × 4]>          NA      NA
# … with 440 more rows, and 4 more variables: LDAlower <dbl>, Sign_time <chr>,
#   pvalue <dbl>, fdr <dbl>

I think the abundance of your metaphlan3 profile is the relative abundance not count. The mp_rrarefy requires the abundance must be count (integer) (and mp_cal_alpha also requires count (default)). To solve the problem, you can multiple the sample depth, then round to keep the integer part (this is also the solution of curatedMetagenomicData). Or you can use the relative abundance to perform the following analysis directly with force=TRUE in some functions (?mp_cal_alpha to see the help information of the function). This means the specified .abundance will be used to calculate directly not be rarefied (default). You can refer to the following examples

> library(MicrobiotaProcess)
> file1 <- system.file("extdata/MetaPhlAn", "metaphlan_test.txt", package="MicrobiotaProcess")
> sample.file <- system.file("extdata/MetaPhlAn", "sample_test.txt", package="MicrobiotaProcess")
> mpse1 <- mp_import_metaphlan(profile=file1, mapfilename=sample.file)
Warning message:
The number of features in otu table is not equal the number of features in taxonomy annotation.
* The same features will be extract automatically !
> mpse1
# A MPSE-tibble (MPSE object) abstraction: 5,260 × 11
# OTU=263 | Samples=20 | Assays=Abundance | Taxanomy=Kingdom, Phylum, Class, Order, Family, Genus
   OTU    Sample  Abundance group taxid  Kingdom Phylum Class Order Family Genus
   <chr>  <chr>       <dbl> <chr> <chr>  <chr>   <chr>  <chr> <chr> <chr>  <chr>
 1 s__Me… GupDM_…     0.596 testA 2157|… k__Arc… p__Eu… c__M… o__M… f__Me… g__M…
 2 s__Ac… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__A… f__Ac… g__A…
 3 s__Ac… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__A… f__Ac… g__A…
 4 s__Ac… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__A… f__Ac… g__A…
 5 s__Bi… GupDM_…     0.948 testA 2|201… k__Bac… p__Ac… c__A… o__B… f__Bi… g__B…
 6 s__Bi… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__B… f__Bi… g__B…
 7 s__Bi… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__B… f__Bi… g__B…
 8 s__Bi… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__B… f__Bi… g__B…
 9 s__Bi… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__B… f__Bi… g__B…
10 s__Bi… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__B… f__Bi… g__B…
# … with 5,250 more rows
> mpse1 %>% mp_cal_alpha(.abundance=Abundance, force=TRUE)
# A MPSE-tibble (MPSE object) abstraction: 5,260 × 15
# OTU=263 | Samples=20 | Assays=Abundance | Taxanomy=Kingdom, Phylum, Class, Order, Family, Genus
   OTU      Sample Abundance group Observe Shannon Simpson     J taxid   Kingdom
   <chr>    <chr>      <dbl> <chr>   <int>   <dbl>   <dbl> <dbl> <chr>   <chr>
 1 s__Meth… GupDM…     0.596 testA      86    3.47   0.951 0.778 2157|2… k__Arc…
 2 s__Acti… GupDM…     0     testA      86    3.47   0.951 0.778 2|2011… k__Bac…
 3 s__Acti… GupDM…     0     testA      86    3.47   0.951 0.778 2|2011… k__Bac…
 4 s__Acti… GupDM…     0     testA      86    3.47   0.951 0.778 2|2011… k__Bac…
 5 s__Bifi… GupDM…     0.948 testA      86    3.47   0.951 0.778 2|2011… k__Bac…
 6 s__Bifi… GupDM…     0     testA      86    3.47   0.951 0.778 2|2011… k__Bac…
 7 s__Bifi… GupDM…     0     testA      86    3.47   0.951 0.778 2|2011… k__Bac…
 8 s__Bifi… GupDM…     0     testA      86    3.47   0.951 0.778 2|2011… k__Bac…
 9 s__Bifi… GupDM…     0     testA      86    3.47   0.951 0.778 2|2011… k__Bac…
10 s__Bifi… GupDM…     0     testA      86    3.47   0.951 0.778 2|2011… k__Bac…
# … with 5,250 more rows, and 5 more variables: Phylum <chr>, Class <chr>, Order <chr>,
#   Family <chr>, Genus <chr>
> mpse1 %>% mp_cal_alpha(.abundance=Abundance, force=TRUE) %>% mp_plot_alpha

alpha

> mpse1 %>% mp_cal_dist(.abundance=Abundance)
# A MPSE-tibble (MPSE object) abstraction: 5,260 × 12
# OTU=263 | Samples=20 | Assays=Abundance | Taxanomy=Kingdom, Phylum, Class, Order, Family, Genus
   OTU    Sample  Abundance group bray   taxid Kingdom Phylum Class Order Family
   <chr>  <chr>       <dbl> <chr> <list> <chr> <chr>   <chr>  <chr> <chr> <chr>
 1 s__Me… GupDM_…     0.596 testA <tibb… 2157… k__Arc… p__Eu… c__M… o__M… f__Me…
 2 s__Ac… GupDM_…     0     testA <tibb… 2|20… k__Bac… p__Ac… c__A… o__A… f__Ac…
 3 s__Ac… GupDM_…     0     testA <tibb… 2|20… k__Bac… p__Ac… c__A… o__A… f__Ac…
 4 s__Ac… GupDM_…     0     testA <tibb… 2|20… k__Bac… p__Ac… c__A… o__A… f__Ac…
 5 s__Bi… GupDM_…     0.948 testA <tibb… 2|20… k__Bac… p__Ac… c__A… o__B… f__Bi…
 6 s__Bi… GupDM_…     0     testA <tibb… 2|20… k__Bac… p__Ac… c__A… o__B… f__Bi…
 7 s__Bi… GupDM_…     0     testA <tibb… 2|20… k__Bac… p__Ac… c__A… o__B… f__Bi…
 8 s__Bi… GupDM_…     0     testA <tibb… 2|20… k__Bac… p__Ac… c__A… o__B… f__Bi…
 9 s__Bi… GupDM_…     0     testA <tibb… 2|20… k__Bac… p__Ac… c__A… o__B… f__Bi…
10 s__Bi… GupDM_…     0     testA <tibb… 2|20… k__Bac… p__Ac… c__A… o__B… f__Bi…
# … with 5,250 more rows, and 1 more variable: Genus <chr>
>mpse1 %>% mp_cal_dist(.abundance=Abundance) %>% mp_plot_dist(.distmethod=bray)

dist

>mpse1 %>% mp_cal_pcoa(.abundance=Abundance, distmethod="bray") %>% mp_plot_ord(.ord=pcoa, show.sample=T)

pcoa

alienzj commented 2 years ago

@xiangpin Many many thanks for your help. Now, I begin to enjoy the microbiome data exploration with MicrobiotaProcess. Thanks !

alienzj commented 2 years ago

@xiangpin Hi, I need your help~

alienzj commented 2 years ago

The mpse3_d0_bnt is a MPSE object constructed with MetaPhlAn3 profile. I have no idea about .abundance=?. Could you please help me to solve this problem?

xiangpin commented 2 years ago

The MPSE object created from MetaPhlAn3 profile only contained the relative abundance of species (the count is also accepted if the relative abundance multiply by the depth of samples). Then the relative abundance of other taxonomy levels can be calculated using mp_cal_abundance with force=TRUE. then if you want to perform the different analyses, you can use mp_diff_analysis (it will check whether the abundance of all taxonomy has been calculated, if it is calculated then it will be used directly, otherwise, it will be calculated in the function internal using mp_cal_abundance) with the specified abundance column name of MPSE. You can use the following examples to run it.

>library(MicrobiotaProcess)
>file1 <- system.file("extdata/MetaPhlAn", "metaphlan_test.txt", package="MicrobiotaProcess")
>sample.file <- system.file("extdata/MetaPhlAn", "sample_test.txt", package="MicrobiotaProcess")
>mpse1 <- mp_import_metaphlan(profile=file1, mapfilename=sample.file)
>mpse1
# A MPSE-tibble (MPSE object) abstraction: 5,260 × 11
# OTU=263 | Samples=20 | Assays=Abundance | Taxanomy=Kingdom, Phylum, Class, Order, Family, Genus
   OTU    Sample  Abundance group taxid  Kingdom Phylum Class Order Family Genus
   <chr>  <chr>       <dbl> <chr> <chr>  <chr>   <chr>  <chr> <chr> <chr>  <chr>
 1 s__Me… GupDM_…     0.596 testA 2157|… k__Arc… p__Eu… c__M… o__M… f__Me… g__M…
 2 s__Ac… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__A… f__Ac… g__A…
 3 s__Ac… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__A… f__Ac… g__A…
 4 s__Ac… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__A… f__Ac… g__A…
 5 s__Bi… GupDM_…     0.948 testA 2|201… k__Bac… p__Ac… c__A… o__B… f__Bi… g__B…
 6 s__Bi… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__B… f__Bi… g__B…
 7 s__Bi… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__B… f__Bi… g__B…
 8 s__Bi… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__B… f__Bi… g__B…
 9 s__Bi… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__B… f__Bi… g__B…
10 s__Bi… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__B… f__Bi… g__B…
# … with 5,250 more rows
Warning message:
The number of features in otu table is not equal the number of features in taxonomy annotation.
* The same features will be extract automatically !
>mpse1 %>% mp_cal_abundance(.abundance=Abundance, force=TRUE) %>% mp_diff_analysis(.abundance=RelAbundanceBySample, .group=group, force=T, relative=F)

Or you can perform mp_diff_analysis directly.

mpse1 %>% mp_diff_analysis(.abundance=Abundance, force=TRUE, .group=group)

alienzj commented 2 years ago

@xiangpin Hi, Shuangbin, thanks for your quick reply. After your meticulous and patient explanation, I began to gradually understand the processing logic of MPSE. I try to reproduce it , but the result does not seem to be right.

alienzj commented 2 years ago

I performed the LEfSe analysis using LEfSe python package, it can identified some biomarker. But when I use the mp_diff_analysis, cann't identify any biomarker, like above example.

I did not correct the relative abundance with sampling depth as you suggested.

xiangpin commented 2 years ago

This is because the default of LEfSe does not perform the fdr correction for the first test. But the default of mp_diff_analysis set the method of filter pvalue of the first test to fdr (p.adjust and filter.p parameter). In addition, the method to check which group is more abundant in LEfSe is median, but the method of mp_diff_analysis is generalizedFC (fc.method parameter)

> library(MicrobiotaProcess)
> file1 <- system.file("extdata/MetaPhlAn", "metaphlan_test.txt", package="MicrobiotaProcess")
> sample.file <- system.file("extdata/MetaPhlAn", "sample_test.txt", package="MicrobiotaProcess")
> mpse1 <- mp_import_metaphlan(profile=file1, mapfilename=sample.file)
> mpse1
# A MPSE-tibble (MPSE object) abstraction: 5,260 × 11
# OTU=263 | Samples=20 | Assays=Abundance | Taxanomy=Kingdom, Phylum, Class, Order, Family, Genus
   OTU    Sample  Abundance group taxid  Kingdom Phylum Class Order Family Genus
   <chr>  <chr>       <dbl> <chr> <chr>  <chr>   <chr>  <chr> <chr> <chr>  <chr>
 1 s__Me… GupDM_…     0.596 testA 2157|… k__Arc… p__Eu… c__M… o__M… f__Me… g__M…
 2 s__Ac… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__A… f__Ac… g__A…
 3 s__Ac… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__A… f__Ac… g__A…
 4 s__Ac… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__A… f__Ac… g__A…
 5 s__Bi… GupDM_…     0.948 testA 2|201… k__Bac… p__Ac… c__A… o__B… f__Bi… g__B…
 6 s__Bi… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__B… f__Bi… g__B…
 7 s__Bi… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__B… f__Bi… g__B…
 8 s__Bi… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__B… f__Bi… g__B…
 9 s__Bi… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__B… f__Bi… g__B…
10 s__Bi… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__B… f__Bi… g__B…
# … with 5,250 more rows
> mpse1 %>% mp_cal_abundance(.abundance=Abundance, force=TRUE, relative=FALSE) %>% mp_diff_analysis(.abundance=Abundance, force=TRUE, relative=FALSE, .group=group, filter.p="pvalue") 
The otutree is empty in the MPSE object!
# A MPSE-tibble (MPSE object) abstraction: 5,260 × 17
# OTU=263 | Samples=20 | Assays=Abundance | Taxanomy=Kingdom, Phylum, Class, Order, Family, Genus
   OTU    Sample  Abundance group taxid  Kingdom Phylum Class Order Family Genus
   <chr>  <chr>       <dbl> <chr> <chr>  <chr>   <chr>  <chr> <chr> <chr>  <chr>
 1 s__Me… GupDM_…     0.596 testA 2157|… k__Arc… p__Eu… c__M… o__M… f__Me… g__M…
 2 s__Ac… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__A… f__Ac… g__A…
 3 s__Ac… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__A… f__Ac… g__A…
 4 s__Ac… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__A… f__Ac… g__A…
 5 s__Bi… GupDM_…     0.948 testA 2|201… k__Bac… p__Ac… c__A… o__B… f__Bi… g__B…
 6 s__Bi… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__B… f__Bi… g__B…
 7 s__Bi… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__B… f__Bi… g__B…
 8 s__Bi… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__B… f__Bi… g__B…
 9 s__Bi… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__B… f__Bi… g__B…
10 s__Bi… GupDM_…     0     testA 2|201… k__Bac… p__Ac… c__A… o__B… f__Bi… g__B…
# … with 5,250 more rows, and 6 more variables: LDAupper <dbl>, LDAmean <dbl>, LDAlower <dbl>,
#   Sign_group <chr>, pvalue <dbl>, fdr <dbl>

alienzj commented 2 years ago

@xiangpin Hi, Shuangbin, thanks for your reply.

The above example can be reproduced.

Can I use median as fc.method ?

alienzj commented 2 years ago

I also have a another question, If I do twice mp_diff_analysis, and got two tree visualization, can be merge two tree into one tree ? If we have many samples, can we just show the mean relative abundance of each group in tree visualization ?

xiangpin commented 2 years ago

Can I use median as fc.method ? Yes, you can specify fc.method to compare_median to use median, or compare_mean to use mean.
If we have many samples, can we just show the mean relative abundance of each group in tree visualization ? Yes, you can run mp_cal_abundance with specified .group to calculate the relative abundance or (abundance) for each group. You can refer to the following example.

> mouse.time.mpse %>% mp_cal_abundance(.abundance=Abundance) %>% mp_cal_abundance(.abundance=Abundance, .group=time) %>% mp_diff_analysis(.abundance=RelRareAbundanceBySample, .group=time) %>% mp_extract_tree() -> trda
The otutree is empty in the MPSE object!
The RareAbundance was in the MPSE object, please check whether it has been rarefied !
The otutree is empty in the MPSE object!
The RareAbundance was in the MPSE object, please check whether it has been rarefied !
The otutree is empty in the MPSE object!

The first mp_cal_abundance is to calculate the relative abundance (default relative=TRUE, and rarefied will be done in the internal default force=TRUE) of all taxonomy for each sample, The second mp_cal_abundance is to calculate the relative abundance (the same as before) of all taxonomy for each group. And the results will be added to the associated data of treedata with the nested format.

You can print the treedata to view its format and structure. And you can use ggtree and ggtreeExtra to visualize it.

> trda
'treedata' S4 object'.

...@ phylo:

Phylogenetic tree with 232 tips and 218 internal nodes.

Tip labels:
  OTU_58, OTU_67, OTU_231, OTU_188, OTU_150, OTU_207, ...
Node labels:
  r__root, k__Bacteria, p__Actinobacteria, p__Bacteroidetes, p__Cyanobacteria, p__Deinococcus-Thermus, ...

Rooted; no branch lengths.

with the following features available:
        'nodeClass',    'nodeDepth',    'RareAbundanceBytime',  'RareAbundanceBySample',
        'LDAupper',     'LDAmean',      'LDAlower',     'Sign_time',    'pvalue',       'fdr'.

# The associated data tibble abstraction: 450 × 13
# The 'node', 'label' and 'isTip' are from the phylo tree.
    node label   isTip nodeClass nodeDepth RareAbundanceBytime RareAbundanceByS…
   <dbl> <chr>   <lgl> <chr>         <dbl> <list>              <list>
 1     1 OTU_58  TRUE  OTU               8 <tibble [2 × 3]>    <tibble [19 × 4]>
 2     2 OTU_67  TRUE  OTU               8 <tibble [2 × 3]>    <tibble [19 × 4]>
 3     3 OTU_231 TRUE  OTU               8 <tibble [2 × 3]>    <tibble [19 × 4]>
 4     4 OTU_188 TRUE  OTU               8 <tibble [2 × 3]>    <tibble [19 × 4]>
 5     5 OTU_150 TRUE  OTU               8 <tibble [2 × 3]>    <tibble [19 × 4]>
 6     6 OTU_207 TRUE  OTU               8 <tibble [2 × 3]>    <tibble [19 × 4]>
 7     7 OTU_5   TRUE  OTU               8 <tibble [2 × 3]>    <tibble [19 × 4]>
 8     8 OTU_80  TRUE  OTU               8 <tibble [2 × 3]>    <tibble [19 × 4]>
 9     9 OTU_163 TRUE  OTU               8 <tibble [2 × 3]>    <tibble [19 × 4]>
10    10 OTU_1   TRUE  OTU               8 <tibble [2 × 3]>    <tibble [19 × 4]>
# … with 440 more rows, and 6 more variables: LDAupper <dbl>, LDAmean <dbl>,
#   LDAlower <dbl>, Sign_time <chr>, pvalue <dbl>, fdr <dbl>
> trda %>% tidyr::unnest(RareAbundanceBytime)
# A tbl_df is returned for independent data analysis.
# A tibble: 898 × 15
    node label   isTip nodeClass nodeDepth time  RareAbundanceBy… RelRareAbundanc…
   <dbl> <chr>   <lgl> <chr>         <dbl> <chr>            <int>            <dbl>
 1     1 OTU_58  TRUE  OTU               8 Early                0          0
 2     1 OTU_58  TRUE  OTU               8 Late                 0          0
 3     2 OTU_67  TRUE  OTU               8 Early               18          0.0794
 4     2 OTU_67  TRUE  OTU               8 Late               113          0.449
 5     3 OTU_231 TRUE  OTU               8 Early                0          0
 6     3 OTU_231 TRUE  OTU               8 Late                 1          0.00397
 7     4 OTU_188 TRUE  OTU               8 Early                3          0.0132
 8     4 OTU_188 TRUE  OTU               8 Late                 7          0.0278
 9     5 OTU_150 TRUE  OTU               8 Early                8          0.0353
10     5 OTU_150 TRUE  OTU               8 Late                 5          0.0199
# … with 888 more rows, and 7 more variables: RareAbundanceBySample <list>,
#   LDAupper <dbl>, LDAmean <dbl>, LDAlower <dbl>, Sign_time <chr>,
#   pvalue <dbl>, fdr <dbl>
>

Just make a small modification to the example of vignettes

p1 <- trda %>%
      ggtree(
        layout="radial",
        size = 0.3
      ) +
      geom_point(
        data = td_filter(!isTip),
        fill="white",
        size=1,
        shape=21
      )
# display the high light of phylum clade.
p2 <- p1 +
      geom_hilight(
        #data = td_filter(nodeClass == "Phylum"), # This is supported by ggtree >=3.1.3
        mapping = aes(subset = nodeClass == "Phylum",
                      node = node,
                      fill = label)
      )
# display the relative abundance of features(OTU)
p3 <- p2 +
      ggnewscale::new_scale("fill") +
      geom_fruit(
         data = td_unnest(RareAbundanceBytime),
         geom = geom_star,
         mapping = aes(
                       x = time,
                       size = RelRareAbundanceBytime,
                       fill = time,
                       subset = RelRareAbundanceBytime > 0
                   ),
         starshape = 13,
         starstroke = 0.25,
         offset = 0.04,
         pwidth = 0.1,
         grid.params = list(linetype=2)
      ) +
      scale_size_continuous(
         name="Relative Abundance (%)",
         range = c(1, 3)
      ) +
      scale_fill_manual(values=c("#1B9E77", "#D95F02"))
# display the tip labels of taxa tree
p4 <- p3 + geom_tiplab(size=2, offset=2)
# display the LDA of significant OTU.
p5 <- p4 +
      ggnewscale::new_scale("fill") +
      geom_fruit(
         geom = geom_col,
         mapping = aes(
                       x = LDAmean,
                       fill = Sign_time,
                       subset = !is.na(LDAmean)
                       ),
         orientation = "y",
         offset = 0.3,
         pwidth = 0.5,
         axis.params = list(axis = "x",
                            title = "Log10(LDA)",
                            title.height = 0.01,
                            title.size = 2,
                            text.size = 1.8,
                            vjust = 1),
         grid.params = list(linetype = 2)
      )

# display the significant (FDR) taxonomy after kruskal.test (default)
p6 <- p5 +
      ggnewscale::new_scale("size") +
      geom_point(
         data=td_filter(!is.na(fdr)),
         mapping = aes(size = -log10(fdr),
                       fill = Sign_time,
                       ),
         shape = 21,
      ) +
      scale_size_continuous(range=c(1, 3)) +
      scale_fill_manual(values=c("#1B9E77", "#D95F02"))

p6 <- p6 + theme(
           legend.key.height = unit(0.3, "cm"),
           legend.key.width = unit(0.3, "cm"),
           legend.spacing.y = unit(0.02, "cm"),
           legend.text = element_text(size = 7),
           legend.title = element_text(size = 9),
          )
p6

PS: The display of results can be diverse, this example is just one of them, more types can be displayed using ggtree and ggtreeExtra easily since they inherit the ggplot2 grammar.

alienzj commented 2 years ago

@xiangpin Hi, Shuangbin, thanks for your reply. I was really amazed by the streaming operation in MPSE, you help me a lot, thanks so much!

I used fc.method="compare_median" and found that the LDA score calculated by mp_diff_analysis tends to be larger. I don't know much about the calculation details inside LEfSe, maybe I have not set the same parameters.

xiangpin commented 2 years ago

The LDA score was calculated using multiple iteration subset datasets (extract from the original dataset (default relative abundance table of all taxonomy)), which is random. To make the analysis reproducible, An random seed (seed parameter) was introduced through withr package in the internal of mp_diff_analysis. However, The method of LEfSe might have a problem, it used the random seed in the function, which might cause the subset dataset will be the same for each subset. This means it is not cross-validation-like.

alienzj commented 2 years ago

@xiangpin Hi, Shuangbin, thanks for your answer.

The method ofLEfSemight have a problem: You means MPSE or LEfSe python package ?

xiangpin commented 2 years ago

LEfSe python package

alienzj commented 2 years ago

OK, I choose to use MPSE, and also recommand it to my friends and collegues.

Thanks so much!

xiangpin commented 2 years ago

Thanks, that is great.

alienzj commented 2 years ago

Running:

tree_ogd_ce_t2$data %>%
tidyr::unnest(AbundanceBySample) %>%
select(label, nodeClass, Sample, group, RelAbundanceBySample)

Got:

# A tibble: 1,155 × 5
   label                          nodeClass Sample        group RelAbundanceByS…
   <chr>                          <chr>     <chr>         <chr>            <dbl>
 1 s__Microbacterium_ginsengisoli OTU       CLN-control-A Cont…             0   
 2 s__Microbacterium_ginsengisoli OTU       CLN-control-B Cont…            89.0 
 3 s__Microbacterium_ginsengisoli OTU       CLN-control-C Cont…            92.5 
 4 s__Microbacterium_ginsengisoli OTU       CLN-corner-A  Corn…            75.0 
 5 s__Microbacterium_ginsengisoli OTU       CLN-corner-B  Corn…            77.7 
 6 s__Microbacterium_ginsengisoli OTU       CLN-corner-C  Corn…             9.68
 7 s__Microbacterium_ginsengisoli OTU       CLN-endo-A    Endo             92.9 
 8 s__Microbacterium_ginsengisoli OTU       CLN-endo-B    Endo             77.6 
 9 s__Microbacterium_ginsengisoli OTU       CLN-endo-C    Endo             76.6 
10 s__Microbacterium_ginsengisoli OTU       Empty-A       Empty            20.3 
# … with 1,145 more rows

Running:

tree_ogd_ce_t3 <- tree_ogd_ce_t2 +
      ggnewscale::new_scale("fill") +
      geom_fruit(
         data = td_unnest(AbundanceBySample),
         geom = geom_star,
         mapping = aes(x = fct_reorder(Sample, group, .fun=min),
                       size = RelAbundanceBySample,
                       fill = group,
                       subset = RelAbundanceBySample > 0
                   ),
         starshape = 13,
         starstroke = 0.25,
         offset = 0.02,
         pwidth = 0.05,
         grid.params = list(linetype=2)
      ) + scale_size_continuous(
         name="Relative Abundance (%)",
         range = c(1, 3)
      ) + scale_fill_manual(values=group_color)

print(tree_ogd_ce_t3)

Got:

Error in eval(parse(text = quo_name(object$mapping$subset))) : 
  object 'RelAbundanceBySample' not found

Any suggestion ? Thanks ~

xiangpin commented 2 years ago

Would you mind giving me the file? Or you can try the mp_plot_diff_res which is a new feature. you can update MicrobiotaProcess with remotes::install_github("YuLab-SMU/MicrobiotaProcess").

alienzj commented 2 years ago

Thanks so much ~ I will try mp_plot_diff_res.

tree_diff_mpse_ogd_control_empty.zip You can use this file to debug.

xiangpin commented 2 years ago

I can plot the object with the following codes.

library(ggtreeExtra)                                                                                                                                                                                               library(ggtree)
library(MicrobiotaProcess)
library(ggstar)
library(forcats)
library(ggplot2)

trda <- readRDS("./tree_diff_mpse_ogd_control_empty.rds")

trda

p <- ggtree(trda) +
     geom_hilight(
         mapping = aes(
           subset = nodeClass == "Phylum",
           node = node,
           fill = label
         )
     )
p2 <- p +
      ggnewscale::new_scale_fill() +
      geom_fruit(
         data = td_unnest(AbundanceBySample, names_repair=tidyr::tidyr_legacy),
         geom = geom_star,
         mapping = aes(
            x = fct_reorder(Sample, group, .fun=min),
            size = RelAbundanceBySample,
            fill = group,
            subset = RelAbundanceBySample > 0
         ),
         starshape = 13,
         pwidth = 2,
         grid.params = list(linetype=2)
      )
p3 <- p2 +
      ggnewscale::new_scale("fill") +
      geom_fruit(
         geom = geom_col,
         mapping = aes(
                       x = LDAmean,
                       fill = Sign_group,
                       subset = !is.na(LDAmean)
                       ),
         orientation = "y",
         offset = 0.1,
         pwidth = 1,
         axis.params = list(axis = "x",
                            title = "Log10(LDA)",
                            title.height = 0.1,
                            title.size = 2,
                            text.size = 1.8,
                            vjust = 1),
         grid.params = list(linetype = 1)
      ) +
      geom_tiplab(
         size = 6,
         as_ylab = TRUE
      ) +
      theme(
           legend.key.height = unit(0.3, "cm"),
           legend.key.width = unit(0.3, "cm"),
           legend.spacing.y = unit(0.02, "cm"),
           legend.text = element_text(size = 7),
           legend.title = element_text(size = 9),
      ) +
      scale_x_continuous(expand=c(0, 0.05))
p3

xx2

PS : I think it is better to run td_unnest with names_repair=tidyr::tidyr_legacy. this is because when there are the same column names between the tibble before unnest and the tibble after unnest. Please see the illustration below.

> library(tidyr)
> library(dplyr)

Attaching package: ‘dplyr’

The following objects are masked from ‘package:stats’:

    filter, lag

The following objects are masked from ‘package:base’:

    intersect, setdiff, setequal, union

> dat <- data.frame(A=c(1, 2, 3), B=c(4, 5, 6))
> dat %>% nest(a=c(A, B))
# A tibble: 1 × 1
  a
  <list>
1 <tibble [3 × 2]>
> dat %>% nest(a=c(A, B)) %>% mutate(A=1, B=2)
# A tibble: 1 × 3
  a                    A     B
  <list>           <dbl> <dbl>
1 <tibble [3 × 2]>     1     2
> dat %>% nest(a=c(A, B)) %>% mutate(A=1, B=2) %>% unnest(a)
Error: Names must be unique.
✖ These names are duplicated:
  * "A" at locations 1 and 3.
  * "B" at locations 2 and 4.
ℹ Use argument `names_repair` to specify repair strategy.
Run `rlang::last_error()` to see where the error occurred.
> dat %>% nest(a=c(A, B)) %>% mutate(A=1, B=2) %>% unnest(a, names_repair=tidyr_legacy)
# A tibble: 3 × 4
      A     B    A1    B1
  <dbl> <dbl> <dbl> <dbl>
1     1     4     1     2
2     2     5     1     2
3     3     6     1     2

alienzj commented 2 years ago

That's great ! Thanks so much !

alienzj commented 2 years ago

@xiangpin Dear xiangpin, could you please show the R session info to me when run the above code ?

I found I can't reproduce the results of above code, there is still the same error: Error in eval(parse(text = quo_name(object$mapping$subset))) : object 'RelAbundanceBySample' not found.

Below is my R session info:

R version 4.0.5 (2021-03-31)
Platform: x86_64-conda-linux-gnu (64-bit)
Running under: Ubuntu 20.04.3 LTS

Matrix products: default
BLAS/LAPACK: /home/zhujie/.conda/envs/bioenv3.8/lib/libopenblasp-r0.3.17.so

locale:
 [1] LC_CTYPE=en_HK.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_HK.UTF-8        LC_COLLATE=en_HK.UTF-8    
 [5] LC_MONETARY=en_HK.UTF-8    LC_MESSAGES=en_HK.UTF-8   
 [7] LC_PAPER=en_HK.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_HK.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] tools     stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] ggtreeExtra_1.0.2       ggstar_1.0.2            ggplotify_0.1.0        
 [4] ggtree_3.3.0.900        ggpubr_0.4.0            forcats_0.5.1          
 [7] stringr_1.4.0           dplyr_1.0.7             purrr_0.3.4            
[10] readr_2.0.2             tidyr_1.1.4             tibble_3.1.5           
[13] ggplot2_3.3.5           tidyverse_1.3.1         MicrobiotaProcess_1.7.2

loaded via a namespace (and not attached):
  [1] ggnewscale_0.4.5            TH.data_1.1-0              
  [3] colorspace_2.0-2            ggsignif_0.6.3             
  [5] ellipsis_0.3.2              modeltools_0.2-23          
  [7] rio_0.5.27                  XVector_0.30.0             
  [9] GenomicRanges_1.42.0        fs_1.5.0                   
 [11] aplot_0.1.1                 rstudioapi_0.13            
 [13] farver_2.1.0                ggrepel_0.9.1              
 [15] fansi_0.5.0                 mvtnorm_1.1-3              
 [17] lubridate_1.8.0             coin_1.4-2                 
 [19] xml2_1.3.2                  codetools_0.2-18           
 [21] splines_4.0.5               libcoin_1.0-9              
 [23] jsonlite_1.7.2              broom_0.7.9                
 [25] cluster_2.1.2               dbplyr_2.1.1               
 [27] compiler_4.0.5              httr_1.4.2                 
 [29] backports_1.2.1             assertthat_0.2.1           
 [31] Matrix_1.3-2                lazyeval_0.2.2             
 [33] cli_3.1.0                   gtable_0.3.0               
 [35] glue_1.4.2                  GenomeInfoDbData_1.2.4     
 [37] Rcpp_1.0.7                  carData_3.0-4              
 [39] Biobase_2.50.0              cellranger_1.1.0           
 [41] vctrs_0.3.8                 Biostrings_2.58.0          
 [43] ape_5.5                     nlme_3.1-153               
 [45] gghalves_0.1.1              iterators_1.0.13           
 [47] ggalluvial_0.12.3           openxlsx_4.2.4             
 [49] rvest_1.0.2                 lifecycle_1.0.1            
 [51] rstatix_0.7.0               zlibbioc_1.36.0            
 [53] MASS_7.3-54                 zoo_1.8-9                  
 [55] scales_1.1.1                ragg_1.2.0                 
 [57] hms_1.1.1                   MatrixGenerics_1.2.1       
 [59] parallel_4.0.5              SummarizedExperiment_1.20.0
 [61] sandwich_3.0-1              curl_4.3.2                 
 [63] gridExtra_2.3               dtplyr_1.1.0               
 [65] ggfun_0.0.4                 yulab.utils_0.0.4.901      
 [67] stringi_1.7.4               S4Vectors_0.28.1           
 [69] foreach_1.5.1               tidytree_0.3.5             
 [71] permute_0.9-5               BiocGenerics_0.36.1        
 [73] zip_2.2.0                   GenomeInfoDb_1.26.7        
 [75] systemfonts_1.0.3           rlang_0.4.12               
 [77] pkgconfig_2.0.3             matrixStats_0.61.0         
 [79] bitops_1.0-7                lattice_0.20-44            
 [81] labeling_0.4.2              treeio_1.19.0              
 [83] patchwork_1.1.1             tidyselect_1.1.1           
 [85] magrittr_2.0.1              R6_2.5.1                   
 [87] IRanges_2.24.1              generics_0.1.1             
 [89] multcomp_1.4-17             DelayedArray_0.16.3        
 [91] DBI_1.1.1                   pillar_1.6.4               
 [93] haven_2.4.3                 foreign_0.8-81             
 [95] withr_2.4.2                 mgcv_1.8-36                
 [97] abind_1.4-5                 survival_3.2-13            
 [99] RCurl_1.98-1.5              modelr_0.1.8               
[101] crayon_1.4.2                car_3.0-11                 
[103] utf8_1.2.2                  tzdb_0.1.2                 
[105] grid_4.0.5                  readxl_1.3.1               
[107] data.table_1.14.2           vegan_2.5-7                
[109] digest_0.6.28               reprex_2.0.1               
[111] textshaping_0.3.4           gridGraphics_0.5-1         
[113] stats4_4.0.5                munsell_0.5.0

xiangpin commented 2 years ago

I think it can work by updating ggtreeExtra. Ps: you also can check whether the RelAbundanceBySample is exits when you unnest the treedata.

> tr.da %>% as_tibble %>% colnames()
 [1] "parent"                       "node"
 [3] "label"                        "nodeClass"
 [5] "nodeDepth"                    "AbundanceBySample"
 [7] "AbundanceByendoscopy_type"    "AbundanceBygroup"
 [9] "AbundanceByparticle_size"     "RelAbundanceBySampleBySample"
[11] "LDAupper"                     "LDAmean"
[13] "LDAlower"                     "Sign_group"
[15] "pvalue"                       "fdr"
> tr.da %>% unnest(AbundanceBySample) %>% colnames()
# A tbl_df is returned for independent data analysis.
 [1] "node"                         "label"
 [3] "isTip"                        "nodeClass"
 [5] "nodeDepth"                    "Sample"
 [7] "endoscopy_type"               "group"
 [9] "particle_size"                "Abundance"
[11] "RelAbundanceBySample"         "AbundanceByendoscopy_type"
[13] "AbundanceBygroup"             "AbundanceByparticle_size"
[15] "RelAbundanceBySampleBySample" "LDAupper"
[17] "LDAmean"                      "LDAlower"
[19] "Sign_group"                   "pvalue"
[21] "fdr"

This is my R session info

R version 4.1.1 (2021-08-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.4 LTS

Matrix products: default
BLAS:   /mnt/d/UbuntuApps/R/4.1.1/lib/R/lib/libRblas.so
LAPACK: /mnt/d/UbuntuApps/R/4.1.1/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C
 [9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] forcats_0.5.1               ggplot2_3.3.5
[3] ggstar_1.0.2                MicrobiotaProcess_1.7.3.990
[5] ggtree_3.3.0.900            ggtreeExtra_1.5.1

loaded via a namespace (and not attached):
 [1] MatrixGenerics_1.6.0        Biobase_2.54.0
 [3] tidyr_1.1.4                 jsonlite_1.7.2
 [5] splines_4.1.1               foreach_1.5.1
 [7] assertthat_0.2.1            stats4_4.1.1
 [9] yulab.utils_0.0.4.901       coin_1.4-2
[11] GenomeInfoDbData_1.2.7      ggrepel_0.9.1
[13] pillar_1.6.4                lattice_0.20-45
[15] glue_1.5.0                  GenomicRanges_1.46.0
[17] XVector_0.34.0              ggsignif_0.6.3
[19] colorspace_2.0-2            sandwich_3.0-1
[21] ggfun_0.0.4                 Matrix_1.3-4
[23] ggnewscale_0.4.5            pkgconfig_2.0.3
[25] zlibbioc_1.40.0             purrr_0.3.4
[27] mvtnorm_1.1-3               patchwork_1.1.1
[29] tidytree_0.3.6              scales_1.1.1
[31] ggplotify_0.1.0             tibble_3.1.6
[33] mgcv_1.8-38                 generics_0.1.1
[35] IRanges_2.28.0              ellipsis_0.3.2
[37] withr_2.4.2                 TH.data_1.1-0
[39] SummarizedExperiment_1.24.0 BiocGenerics_0.40.0
[41] lazyeval_0.2.2              cli_3.1.0
[43] survival_3.2-13             magrittr_2.0.1
[45] crayon_1.4.2                fansi_0.5.0
[47] nlme_3.1-153                MASS_7.3-54
[49] vegan_2.5-7                 tools_4.1.1
[51] lifecycle_1.0.1             matrixStats_0.61.0
[53] multcomp_1.4-17             aplot_0.1.1
[55] S4Vectors_0.32.0            munsell_0.5.0
[57] cluster_2.1.2               DelayedArray_0.20.0
[59] Biostrings_2.62.0           compiler_4.1.1
[61] GenomeInfoDb_1.30.0         gridGraphics_0.5-1
[63] rlang_0.4.12                grid_4.1.1
[65] RCurl_1.98-1.5              iterators_1.0.13
[67] bitops_1.0-7                gtable_0.3.0
[69] codetools_0.2-18            DBI_1.1.1
[71] R6_2.5.1                    gridExtra_2.3
[73] zoo_1.8-9                   dplyr_1.0.7
[75] utf8_1.2.2                  libcoin_1.0-9
[77] treeio_1.18.0               permute_0.9-5
[79] ape_5.5-2                   modeltools_0.2-23
[81] parallel_4.1.1              Rcpp_1.0.7
[83] vctrs_0.3.8                 tidyselect_1.1.1

alienzj commented 2 years ago

Thanks so much ! I updated ggtreeExtra, now it work.

ilma74 commented 2 years ago

Hi, I am running biomarker identification step deres <- diff_analysis(obj = ps, classgroup = "Treatment", mlfun = "lda", filtermod = "pvalue", firstcomfun = "kruskal_test", firstalpha = 0.05, strictmod = TRUE, secondcomfun = "wilcox_test", subclmin = 3, subclwilc = TRUE, secondalpha = 0.01, lda=3) and am getting Error in lda.default(x, grouping, ...) : variables 33 37 appear to be constant within groups.

Is there a way how to filter out the constant variables? I am limited in my R skills and suggestions would be welcome :) Thanks, Ilma

xiangpin commented 2 years ago

@ilma74 hello, which version of MicrobiotaProcess are you using? I cannot reproduce your issue. Would you mind sending me the dataset by email? I will not distribute this data without your permission. My email is xshuangbin@163.com

xiangpin commented 2 years ago

@ilma74 Hello, this bug is fixed by the GitHub version, you can reinstall it via remotes::install_github("YuLab-SMU/MicrobiotaProcess"), then try again.

ilma74 commented 2 years ago

Thanks! Now it works. best Ilma

From: Shuangbin Xu @.> Sent: 27 January 2022 10:56 To: YuLab-SMU/MicrobiotaProcess @.> Cc: Tapio Ilma (LUKE) @.>; Mention @.> Subject: Re: [YuLab-SMU/MicrobiotaProcess] Biomarker discovery explanation and new package's feature discussion (#35)

@ilma74https://github.com/ilma74 Hello, this bug is fixed by the GitHub version, you can reinstall it via remotes::install_github("YuLab-SMU/MicrobiotaProcess"), then try again.

— Reply to this email directly, view it on GitHubhttps://github.com/YuLab-SMU/MicrobiotaProcess/issues/35#issuecomment-1022983476, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AOS3PVCQILHWNE4XQVFU6STUYECBTANCNFSM5FZSSNUQ. Triage notifications on the go with GitHub Mobile for iOShttps://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Androidhttps://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub. You are receiving this because you were mentioned.Message ID: @.***>

taekwonleo-yuhao commented 2 years ago

Dear Xiangpin,

I would like to ask if MicrobiotaProcess can help us to quickly assess patterns of associations between numerical variables in the dataset, such as microbial abundances of identified biomarkers and other metadata. If you do not plan to add this kind of function, could you please give me some tips on how to do it?

Thanks, Yuhao

xiangpin commented 2 years ago

The mp_plot_alpha of MicrobiotaProcess can help you to view the pattern. You can refer to the following codes

> library(MicrobiotaProcess)
MicrobiotaProcess v1.7.8.990 For help:
https://github.com/YuLab-SMU/MicrobiotaProcess/issues

If you use MicrobiotaProcess in published research, please cite the
paper:

S Xu, L Zhan, W Tang, Z Dai, L Zhou, T Feng, M Chen, S Liu, X Fu, T Wu,
E Hu, G Yu. MicrobiotaProcess: A comprehensive R package for managing
and analyzing microbiome and other ecological data within the tidy
framework. 04 February 2022, PREPRINT (Version 1) available at Research
Square [https://doi.org/10.21203/rs.3.rs-1284357/v1]

This message can be suppressed by:
suppressPackageStartupMessages(library(MicrobiotaProcess))
> data(mouse.time.mpse)
> mouse.time.mpse
# A MPSE-tibble (MPSE object) abstraction: 4,142 × 11
# OTU=218 | Samples=19 | Assays=Abundance | Taxonomy=Kingdom, Phylum, Class, Order, Family, Genus, Species
   OTU    Sample Abundance time  Kingdom Phylum Class Order Family Genus Species
   <chr>  <chr>      <int> <chr> <chr>   <chr>  <chr> <chr> <chr>  <chr> <chr>
 1 OTU_1  F3D0         579 Early k__Bac… p__Ba… c__B… o__B… f__Mu… g__u… s__un_…
 2 OTU_2  F3D0         345 Early k__Bac… p__Ba… c__B… o__B… f__Mu… g__u… s__un_…
 3 OTU_3  F3D0         449 Early k__Bac… p__Ba… c__B… o__B… f__Mu… g__u… s__un_…
 4 OTU_4  F3D0         430 Early k__Bac… p__Ba… c__B… o__B… f__Mu… g__u… s__un_…
 5 OTU_5  F3D0         154 Early k__Bac… p__Ba… c__B… o__B… f__Ba… g__B… s__un_…
 6 OTU_6  F3D0         470 Early k__Bac… p__Ba… c__B… o__B… f__Mu… g__u… s__un_…
 7 OTU_7  F3D0         282 Early k__Bac… p__Ba… c__B… o__B… f__Mu… g__u… s__un_…
 8 OTU_8  F3D0         184 Early k__Bac… p__Ba… c__B… o__B… f__Ri… g__A… s__un_…
 9 OTU_9  F3D0          45 Early k__Bac… p__Ba… c__B… o__B… f__Mu… g__u… s__un_…
10 OTU_10 F3D0         158 Early k__Bac… p__Ba… c__B… o__B… f__Mu… g__u… s__un_…
# … with 4,132 more rows
> mouse.time.mpse %>% mp_extract_sample %>% dplyr::mutate(test=abs(rnorm(19))) %>% select(-time) -> numeric_var
> mouse.time.mpse %>% left_join(numeric_var, by='Sample')
# A MPSE-tibble (MPSE object) abstraction: 4,142 × 12
# OTU=218 | Samples=19 | Assays=Abundance | Taxonomy=Kingdom, Phylum, Class, Order, Family, Genus, Species
   OTU    Sample Abundance time   test Kingdom   Phylum Class Order Family Genus
   <chr>  <chr>      <int> <chr> <dbl> <chr>     <chr>  <chr> <chr> <chr>  <chr>
 1 OTU_1  F3D0         579 Early 0.652 k__Bacte… p__Ba… c__B… o__B… f__Mu… g__u…
 2 OTU_2  F3D0         345 Early 0.652 k__Bacte… p__Ba… c__B… o__B… f__Mu… g__u…
 3 OTU_3  F3D0         449 Early 0.652 k__Bacte… p__Ba… c__B… o__B… f__Mu… g__u…
 4 OTU_4  F3D0         430 Early 0.652 k__Bacte… p__Ba… c__B… o__B… f__Mu… g__u…
 5 OTU_5  F3D0         154 Early 0.652 k__Bacte… p__Ba… c__B… o__B… f__Ba… g__B…
 6 OTU_6  F3D0         470 Early 0.652 k__Bacte… p__Ba… c__B… o__B… f__Mu… g__u…
 7 OTU_7  F3D0         282 Early 0.652 k__Bacte… p__Ba… c__B… o__B… f__Mu… g__u…
 8 OTU_8  F3D0         184 Early 0.652 k__Bacte… p__Ba… c__B… o__B… f__Ri… g__A…
 9 OTU_9  F3D0          45 Early 0.652 k__Bacte… p__Ba… c__B… o__B… f__Mu… g__u…
10 OTU_10 F3D0         158 Early 0.652 k__Bacte… p__Ba… c__B… o__B… f__Mu… g__u…
# … with 4,132 more rows, and 1 more variable: Species <chr>
> mouse.time.mpse %>% left_join(numeric_var, by='Sample') %>% mp_rrarefy() %>% mp_cal_alpha()
# A MPSE-tibble (MPSE object) abstraction: 4,142 × 19
# OTU=218 | Samples=19 | Assays=Abundance, RareAbundance | Taxonomy=Kingdom, Phylum, Class, Order, Family, Genus, Species
   OTU    Sample Abundance RareAbundance time   test Observe Chao1   ACE Shannon
   <chr>  <chr>      <int>         <int> <chr> <dbl>   <dbl> <dbl> <dbl>   <dbl>
 1 OTU_1  F3D0         579           214 Early 0.652     104  104.  105.    3.88
 2 OTU_2  F3D0         345           116 Early 0.652     104  104.  105.    3.88
 3 OTU_3  F3D0         449           179 Early 0.652     104  104.  105.    3.88
 4 OTU_4  F3D0         430           167 Early 0.652     104  104.  105.    3.88
 5 OTU_5  F3D0         154            54 Early 0.652     104  104.  105.    3.88
 6 OTU_6  F3D0         470           174 Early 0.652     104  104.  105.    3.88
 7 OTU_7  F3D0         282           115 Early 0.652     104  104.  105.    3.88
 8 OTU_8  F3D0         184            74 Early 0.652     104  104.  105.    3.88
 9 OTU_9  F3D0          45            16 Early 0.652     104  104.  105.    3.88
10 OTU_10 F3D0         158            59 Early 0.652     104  104.  105.    3.88
# … with 4,132 more rows, and 9 more variables: Simpson <dbl>, Pielou <dbl>,
#   Kingdom <chr>, Phylum <chr>, Class <chr>, Order <chr>, Family <chr>,
#   Genus <chr>, Species <chr>
> mouse.time.mpse %>% left_join(numeric_var, by='Sample') %>% mp_rrarefy() %>% mp_cal_alpha() %>% mp_plot_alpha(.group=test, .alpha=Observe)

So you can use the biomarker table replace the numeric_var

> mouse.time.mpse %>% mp_rrarefy() %>% mp_diff_analysis(.abundance= RareAbundance, .group=time) %>% mp_extract_taxatree %>% dplyr::filter(nodeClass=='Genus' & !is.na(LDAmean), keep.td=F) %>% tidyr::unnest(RareAbundanceBySample) %>% select(label, Sample, RelRareAbundanceBySample) %>% tidyr::pivot_wider(id_cols=Sample, names_from=label, values_from=RelRareAbundanceBySample) -> genus.tb
> mouse.time.mpse %>% left_join(genus.tb) %>% mp_rrarefy() %>% mp_cal_alpha() %>% mp_plot_alpha(.group=g__Pseudomonas, .alpha=Shannon)

xx3

Or you can use ggstatsplot to visualize it

> library(ggplot2)
> library(ggstatsplot)
> mouse.time.mpse %>% left_join(genus.tb) %>% mp_rrarefy() %>% mp_cal_alpha() %>% mp_extract_sample %>% ggscatterstats(x=Observe, y=g__Pseudomonas, type='no', ggplot.component=aes(color=time))

xx2

shaodongyan commented 2 years ago

Hello,I can't use " fc.method="compare_median" ,The Error is Error in rownames<-(*tmp*, value = vars) : 不能给没有维度的对象设'rownames' . Why?

shaodongyan commented 2 years ago

Hello,I can't use " fc.method="compare_median" ,The Error is Error in rownames<-(*tmp*, value = vars) : 不能给没有维度的对象设'rownames' . Why?

compare_mean can make sense,I don't know Why.