YuLab-SMU / MicrobiotaProcess

:microbe: A comprehensive R package for deep mining microbiome
https://www.sciencedirect.com/science/article/pii/S2666675823000164
182 stars 37 forks source link

Include the percentage values in the stacked plots #100

Open antoniobio opened 1 year ago

antoniobio commented 1 year ago

Hello, It is possible do include the percentage values in the stacked plots for each OTU or at least extract those values?

Best regards

xiangpin commented 1 year ago

Yes, since the many plot object was a ggplot or ggtree object. You can add other layers (from ggplot2 ecosystem) to the original plot. For example. to do what your want. First, you can use mp_plot_abundance to object the ggplot object, then use geom_fit_text of ggfittext to display the percentage values.

> library(ggplot2)
> library(ggfittext)
> library(MicrobiotaProcess)
MicrobiotaProcess v1.13.2.993 For help:
https://github.com/YuLab-SMU/MicrobiotaProcess/issues

If you use MicrobiotaProcess in published research, please cite the
paper:

Shuangbin Xu, Li Zhan, Wenli Tang, Qianwen Wang, Zehan Dai, Lang Zhou,
Tingze Feng, Meijun Chen, Tianzhi Wu, Erqiang Hu, Guangchuang Yu.
MicrobiotaProcess: A comprehensive R package for deep mining
microbiome. The Innovation. 2023, 4(2):100388. doi:
10.1016/j.xinn.2023.100388

Export the citation to BibTex by citation('MicrobiotaProcess')

This message can be suppressed by:
suppressPackageStartupMessages(library(MicrobiotaProcess))

Attaching package: ‘MicrobiotaProcess’

The following object is masked from ‘package:stats’:

    filter

> data(mouse.time.mpse)
> mouse.time.mpse %>% mp_rrarefy(.abundance=Abundance) %>% mp_plot_abundance(.abundance=RareAbundance, .group=time, taxa.class=Phylum, topn = 20, order.by.feature = "p__Firmicutes", width = 4/5) -> p1
> p2 <- p1 + geom_fit_text(aes(label = paste0(round(RelRareAbundanceBySample,1), "%")), position=position_stack(vjust=.5), show.legend=F, color='white')
> p1 / p2

image

xiangpin commented 1 year ago

Of course, you can use mp_cal_abundance and mp_extract_abundance to obtain the relative abundance or abundance of specific taxa level.

> # This will calculate the relative abundance (argument: relative = T) with `Abundance` directly without rarefraction (force = T)
> mouse.time.mpse %>% mp_cal_abundance(.abundance=Abundance, force=T, relative=T) -> mpse2
> mpse2 %>% mp_extract_abundance(taxa.class=OTU, topn='all')
# A tibble: 218 × 3
   label   nodeClass AbundanceBySample
   <fct>   <chr>     <list>
 1 OTU_67  OTU       <tibble [19 × 4]>
 2 OTU_231 OTU       <tibble [19 × 4]>
 3 OTU_188 OTU       <tibble [19 × 4]>
 4 OTU_150 OTU       <tibble [19 × 4]>
 5 OTU_207 OTU       <tibble [19 × 4]>
 6 OTU_5   OTU       <tibble [19 × 4]>
 7 OTU_1   OTU       <tibble [19 × 4]>
 8 OTU_2   OTU       <tibble [19 × 4]>
 9 OTU_3   OTU       <tibble [19 × 4]>
10 OTU_4   OTU       <tibble [19 × 4]>
# ℹ 208 more rows
# ℹ Use `print(n = ...)` to see more rows
> mpse2 %>% mp_extract_abundance(taxa.class=OTU, topn='all') %>% tidytree::unnest(AbundanceBySample)
# A tibble: 4,142 × 6
   label  nodeClass Sample Abundance RelAbundanceBySample time
   <fct>  <chr>     <chr>      <int>                <dbl> <chr>
 1 OTU_67 OTU       F3D0          24               0.368  Early
 2 OTU_67 OTU       F3D1           0               0      Early
 3 OTU_67 OTU       F3D141        16               0.329  Late
 4 OTU_67 OTU       F3D142        28               1.11   Late
 5 OTU_67 OTU       F3D143        10               0.397  Late
 6 OTU_67 OTU       F3D144        21               0.602  Late
 7 OTU_67 OTU       F3D145         7               0.120  Late
 8 OTU_67 OTU       F3D146         3               0.0773 Late
 9 OTU_67 OTU       F3D147        29               0.223  Late
10 OTU_67 OTU       F3D148        68               0.684  Late
# ℹ 4,132 more rows
# ℹ Use `print(n = ...)` to see more rows
>

The Abundance is the original abundance to calculate the relative abundance, the RelAbundanceBySample is the relative abundance according the Abundance. and the tibble is longer format, which can be processed and visualized by the tidyverse ecosystem.

antoniobio commented 1 year ago

Thank you very much. when I use the plot.group = TRUE I get an error: object 'RelRareAbundanceBySample' not found

Is it possible plot by group with the replicate means of the relative abundance?

best regards,

xiangpin commented 1 year ago

RelRareAbundanceBySample changed to RelRareAbundanceBy + YourGroupName. For example. In the demo datasets. YourGroupName is time. This was to distinguish the sample or different group name.

> library(ggplot2)
> library(ggfittext)
> library(MicrobiotaProcess)
MicrobiotaProcess v1.13.2.993 For help:
https://github.com/YuLab-SMU/MicrobiotaProcess/issues

If you use MicrobiotaProcess in published research, please cite the
paper:

Shuangbin Xu, Li Zhan, Wenli Tang, Qianwen Wang, Zehan Dai, Lang Zhou,
Tingze Feng, Meijun Chen, Tianzhi Wu, Erqiang Hu, Guangchuang Yu.
MicrobiotaProcess: A comprehensive R package for deep mining
microbiome. The Innovation. 2023, 4(2):100388. doi:
10.1016/j.xinn.2023.100388

Export the citation to BibTex by citation('MicrobiotaProcess')

This message can be suppressed by:
suppressPackageStartupMessages(library(MicrobiotaProcess))

Attaching package: ‘MicrobiotaProcess’

The following object is masked from ‘package:stats’:

    filter

> data(mouse.time.mpse)
> mouse.time.mpse %>% mp_rrarefy(.abundance=Abundance) %>% mp_plot_abundance(.abundance=RareAbundance, .group=time, taxa.class=Phylum, topn = 20, width = 4/5, plot.group=T) -> p1
> p1$data
# A tibble: 18 × 5
   Phylum             nodeClass time  RareAbundanceBytime RelRareAbundanceBytime
   <fct>              <chr>     <chr>               <int>                  <dbl>
 1 p__Actinobacteria  Phylum    Early                  31                0.137
 2 p__Actinobacteria  Phylum    Late                  127                0.504
 3 p__Bacteroidetes   Phylum    Early               13489               59.5
 4 p__Bacteroidetes   Phylum    Late                18333               72.8
 5 p__Cyanobacteria   Phylum    Early                  15                0.0662
 6 p__Cyanobacteria   Phylum    Late                    5                0.0199
 7 p__Deinococcus-Th… Phylum    Early                   0                0
 8 p__Deinococcus-Th… Phylum    Late                    1                0.00397
 9 p__Firmicutes      Phylum    Early                8473               37.4
10 p__Firmicutes      Phylum    Late                 6471               25.7
11 p__Patescibacteria Phylum    Early                  69                0.304
12 p__Patescibacteria Phylum    Late                   32                0.127
13 p__Proteobacteria  Phylum    Early                  71                0.313
14 p__Proteobacteria  Phylum    Late                    8                0.0318
15 p__Tenericutes     Phylum    Early                 508                2.24
16 p__Tenericutes     Phylum    Late                  203                0.806
17 p__Verrucomicrobia Phylum    Early                   6                0.0265
18 p__Verrucomicrobia Phylum    Late                    0                0

the RelRareAbundanceBySample had been replaced by RelRareAbundanceBytime in the data of ggplot object. So the mapping of geom_fit_text also needs to be adjusted.

> p2 <- p1 + geom_fit_text(aes(label = paste0(round(RelRareAbundanceBytime,1), "%")), position=position_stack(vjust=.5), show.legend=F, color='white')
> p1 + p2

image

antoniobio commented 1 year ago

Many thanks,

Best regards