YuLab-SMU / MicrobiotaProcess

:microbe: A comprehensive R package for deep mining microbiome
https://www.sciencedirect.com/science/article/pii/S2666675823000164
182 stars 37 forks source link

How to label Taxanomy information in the taxa tree #44

Open taekwonleo-yuhao opened 2 years ago

taekwonleo-yuhao commented 2 years ago

Dear developer,

I have a question about how to display the Taxonomy information in the tip labels of the taxa tree.

In my case, I convert the phyloseq object into MPSE format and try to identify biomarkers.

mpse_ps1_TE3F_TE4F_16wk <- ps1_TE3F_TE4F_16wk  %>% as.MPSE
mpse_ps1_TE3F_TE4F_16wk <- mpse_ps1_TE3F_TE4F_16wk %>% mp_filter_taxa(.abundance = Abundance, min.abun = 1, min.prop = 0.05)

mpse_ps1_TE3F_TE4F_16wk <- mpse_ps1_TE3F_TE4F_16wk %>% mp_diff_analysis(
                                                                        .abundance = Aundance,
                                                                        relative = FALSE,
                                                                        .group = Strain,
                                                                        first.test.method = "kruskal_test",
                                                                        filter.p = "pvalue",
                                                                        first.test.alpha = 0.05,
                                                                        strict = TRUE,
                                                                        second.test.method = "wilcox_test",
                                                                        second.test.alpha = 0.05,
                                                                        subcl.min = 3,
                                                                        subcl.test = TRUE,
                                                                        ml.method = "lda",
                                                                        ldascore = 3,
                                                                        action = "add"
                                                                        )
mpse_ps1_TE3F_TE4F_16wk

# A MPSE-tibble (MPSE object) abstraction: 44,640 × 37
# OTU=288 | Samples=155 | Assays=Abundance, RareAbundance, RelRareAbundanceBySample | Taxonomy=Kingdom, Phylum, Class, Order, Family,
#   Genus, Species
   OTU   Sample Abundance RareAbundance RelRareAbundanc… X.SampleID BarcodeSequence LinkerPrimerSeq… Rev_comp Sex   ID    Strain Treatment
   <chr> <chr>      <int>         <int>            <dbl> <chr>      <chr>           <chr>            <chr>    <chr> <chr> <chr>  <chr>    
 1 TACG… TE3F-…      2011          1188            11.3  TE3F-102_… ATACTTCGCAGGT   CAAGCAGAAGACGGC… ATACTTC… Male  TE3F… TE3F   ABX      
 2 TACG… TE3F-…       233           136             1.30 TE3F-102_… ATACTTCGCAGGT   CAAGCAGAAGACGGC… ATACTTC… Male  TE3F… TE3F   ABX      
 3 TACG… TE3F-…      2007          1141            10.9  TE3F-102_… ATACTTCGCAGGT   CAAGCAGAAGACGGC… ATACTTC… Male  TE3F… TE3F   ABX      
 4 TACG… TE3F-…      1315           744             7.09 TE3F-102_… ATACTTCGCAGGT   CAAGCAGAAGACGGC… ATACTTC… Male  TE3F… TE3F   ABX      
 5 TACG… TE3F-…       346           200             1.91 TE3F-102_… ATACTTCGCAGGT   CAAGCAGAAGACGGC… ATACTTC… Male  TE3F… TE3F   ABX      
 6 TACG… TE3F-…       406           232             2.21 TE3F-102_… ATACTTCGCAGGT   CAAGCAGAAGACGGC… ATACTTC… Male  TE3F… TE3F   ABX      
 7 TACG… TE3F-…       291           173             1.65 TE3F-102_… ATACTTCGCAGGT   CAAGCAGAAGACGGC… ATACTTC… Male  TE3F… TE3F   ABX      
 8 TACG… TE3F-…         0             0             0    TE3F-102_… ATACTTCGCAGGT   CAAGCAGAAGACGGC… ATACTTC… Male  TE3F… TE3F   ABX      
 9 TACG… TE3F-…         0             0             0    TE3F-102_… ATACTTCGCAGGT   CAAGCAGAAGACGGC… ATACTTC… Male  TE3F… TE3F   ABX      
10 TACG… TE3F-…       203           120             1.14 TE3F-102_… ATACTTCGCAGGT   CAAGCAGAAGACGGC… ATACTTC… Male  TE3F… TE3F   ABX      
# … with 44,630 more rows, and 24 more variables: Age <chr>, EXP <chr>, barcode.rc_original <chr>, sequencing.pool <chr>, remove <int>,
#   Kingdom <chr>, Phylum <chr>, Class <chr>, Order <chr>, Family <chr>, Genus <chr>, Species <chr>, LDAupper <dbl>, LDAmean <dbl>,
#   LDAlower <dbl>, Sign_Strain <chr>, pvalue <dbl>, fdr <dbl>, LDAupper.y <dbl>, LDAmean.y <dbl>, LDAlower.y <dbl>, Sign_Strain.y <chr>,
#   pvalue.y <dbl>, fdr.y <dbl>
mpse_ps1_TE3F_TE4F_16wk.taxa.tree <- mpse_ps1_TE3F_TE4F_16wk %>% mp_extract_tree(type='taxatree')
mpse_ps1_TE3F_TE4F_16wk.taxa.tree 

# The associated data tibble abstraction: 442 × 18
# The 'node', 'label' and 'isTip' are from the phylo tree.
    node label          isTip nodeClass nodeDepth RareAbundanceBy… LDAupper LDAmean LDAlower Sign_Strain pvalue   fdr LDAupper.y LDAmean.y
   <int> <chr>          <lgl> <chr>         <dbl> <list>              <dbl>   <dbl>    <dbl> <chr>        <dbl> <dbl>      <dbl>     <dbl>
 1     1 TACGTAGGGTGCA… TRUE  OTU               8 <tibble>               NA      NA       NA NA              NA    NA         NA        NA
 2     2 TACGTAGGGGGCG… TRUE  OTU               8 <tibble>               NA      NA       NA NA              NA    NA         NA        NA
 3     3 TACGTAGGGGGCG… TRUE  OTU               8 <tibble>               NA      NA       NA NA              NA    NA         NA        NA
 4     4 TACGTAGGGAGCA… TRUE  OTU               8 <tibble>               NA      NA       NA NA              NA    NA         NA        NA
 5     5 TACGTATGGGGCA… TRUE  OTU               8 <tibble>               NA      NA       NA NA              NA    NA         NA        NA
 6     6 TACGTAGGGGGCA… TRUE  OTU               8 <tibble>               NA      NA       NA NA              NA    NA         NA        NA
 7     7 TACGTAGGGGGCG… TRUE  OTU               8 <tibble>               NA      NA       NA NA              NA    NA         NA        NA
 8     8 TACGTAGGGGGCG… TRUE  OTU               8 <tibble>               NA      NA       NA NA              NA    NA         NA        NA
 9     9 TACGTAGGGGGCG… TRUE  OTU               8 <tibble>               NA      NA       NA NA              NA    NA         NA        NA
10    10 TACGGAGGATCCG… TRUE  OTU               8 <tibble>               NA      NA       NA NA              NA    NA         NA        NA
# … with 432 more rows, and 4 more variables: LDAlower.y <dbl>, Sign_Strain.y <chr>, pvalue.y <dbl>, fdr.y <dbl>

But I want to modify my tree tip labels as Genus. But I keep failing. Could you please show me how to modify them?

Thanks! Yuhao

xiangpin commented 2 years ago

You can use mp_extract_tree with specific tip.level what you want.

> library(MicrobiotaProcess)
MicrobiotaProcess v1.7.8 For help:
https://github.com/YuLab-SMU/MicrobiotaProcess/issues

If you use MicrobiotaProcess in published research, please cite the
paper:

S Xu, L Zhan, W Tang, Z Dai, L Zhou, T Feng, M Chen, S Liu, X Fu, T Wu,
E Hu, G Yu. MicrobiotaProcess: A comprehensive R package for managing
and analyzing microbiome and other ecological data within the tidy
framework. 04 February 2022, PREPRINT (Version 1) available at Research
Square [https://doi.org/10.21203/rs.3.rs-1284357/v1]

This message can be suppressed by:
suppressPackageStartupMessages(library(MicrobiotaProcess))
> data(mouse.time.mpse)
> mouse.time.mpse %>% mp_rrarefy() %>% mp_diff_analysis(.abundance=RareAbundance, .group=time) %>% mp_extract_tree(tip.level=Genus)
The otutree is empty in the MPSE object!
'treedata' S4 object'.

...@ phylo:

Phylogenetic tree with 57 tips and 67 internal nodes.

Tip labels:
  g__Bifidobacterium, g__Olsenella, g__Enterorhabdus, g__Bacteroides,
g__un_f__Muribaculaceae, g__Alistipes, ...
Node labels:
  r__root, k__Bacteria, p__Actinobacteria, p__Bacteroidetes, p__Cyanobacteria,
p__Deinococcus-Thermus, ...

Rooted; no branch lengths.

with the following features available:
  'nodeClass', 'nodeDepth', 'RareAbundanceBySample', 'LDAupper', 'LDAmean',
'LDAlower', 'Sign_time', 'pvalue', 'fdr'.

# The associated data tibble abstraction: 124 × 12
# The 'node', 'label' and 'isTip' are from the phylo tree.
    node label      isTip nodeClass nodeDepth RareAbundanceByS… LDAupper LDAmean
   <int> <chr>      <lgl> <chr>         <dbl> <list>               <dbl>   <dbl>
 1     1 g__Bifido… TRUE  Genus             6 <tibble [19 × 4]>     3.39    3.36
 2     2 g__Olsene… TRUE  Genus             6 <tibble [19 × 4]>    NA      NA
 3     3 g__Entero… TRUE  Genus             6 <tibble [19 × 4]>    NA      NA
 4     4 g__Bacter… TRUE  Genus             6 <tibble [19 × 4]>    NA      NA
 5     5 g__un_f__… TRUE  Genus             6 <tibble [19 × 4]>     4.96    4.93
 6     6 g__Alisti… TRUE  Genus             6 <tibble [19 × 4]>     4.09    4.07
 7     7 g__un_o__… TRUE  Genus             6 <tibble [19 × 4]>    NA      NA
 8     8 g__Deinoc… TRUE  Genus             6 <tibble [19 × 4]>    NA      NA
 9     9 g__Lactob… TRUE  Genus             6 <tibble [19 × 4]>    NA      NA
10    10 g__Strept… TRUE  Genus             6 <tibble [19 × 4]>    NA      NA
# … with 114 more rows, and 4 more variables: LDAlower <dbl>, Sign_time <chr>,
#   pvalue <dbl>, fdr <dbl>

The result of mp_extract_tree is a treedata class, which can be processed and visualized by tidytree, treeio, ggtree and ggtreeExtra.

taekwonleo-yuhao commented 2 years ago

Thank you very much! Could you please explain what mp_rrarefy() is used for?

xiangpin commented 2 years ago

mp_rrarefy will generate one randomly rarefied community data, which will be added into assays slot.

taekwonleo-yuhao commented 2 years ago

Dear Xiangpin,

I have another question about the LDA bar plotting. I tried to modified codes from your example and put the genus name outside of the LDA bar. And it works for most of the cases. But I found that the LDA bar will become so wide in some of my data sets that I can not find the right way to fix it. image

Could you please give me some suggestions?

Thanks, Yuhao

Below are the codes I modified `p1 <- taxa.tree %>% ggtree( layout="radial", size = 0.3 ) + geom_point( data = td_filter(!isTip), fill="white", size=1, shape=21 )

display the high light of phylum clade.

p2 <- p1 + geom_hilight(

data = td_filter(nodeClass == "Phylum"), # This is supported by ggtree >=3.1.3

    mapping = aes(subset = nodeClass == "Phylum",
                  node = node,
                  fill = label)
  )

display the relative abundance of features(OTU)

p3 <- p2 + ggnewscale::new_scale("fill") + geom_fruit( data = td_unnest(RareAbundanceBySample), geom = geom_star, mapping = aes( x = Treatment, size = RelRareAbundanceBySample, fill = Treatment, subset = RelRareAbundanceBySample > 0 ), starshape = 13, starstroke = 0.25, offset = 0.04, pwidth = 0.1, grid.params = list( linetype=2) ) + scale_size_continuous( name="Relative Abundance (%)", range = c(1, 3) ) + scale_fill_manual(values=c("#00A087FF", "#3C5488FF"))

display the LDA of significant OTU.

p4 <- p3 + ggnewscale::new_scale("fill") + geom_fruit( geom = geom_col, mapping = aes( x = LDAmean, fill = Sign_Treatment, subset = !is.na(LDAmean) ), orientation = "y", offset = 0.1, pwidth = 0.4, axis.params = list(axis = "x", title = "Log10(LDA)", title.height = 0.01, title.size = 2, text.size = 1.8, vjust = 1), grid.params = list(linetype = 2) )

display the significant (FDR) taxonomy after kruskal.test (default)

p5 <- p4 + ggnewscale::new_scale("size") + geom_point( data=td_filter(!is.na(fdr)), mapping = aes(size = -log10(fdr), fill = Sign_Treatment, ), shape = 21, ) + scale_size_continuous(range=c(1, 3)) + scale_fill_manual(values=c("#00A087FF", "#3C5488FF"))

display the tip labels of taxa tree

p6 <- p5 + geom_tiplab(size=2, offset=4.2)

p6 <- p6 + theme( legend.key.height = unit(0.3, "cm"), legend.key.width = unit(0.3, "cm"), legend.spacing.y = unit(0.02, "cm"), legend.text = element_text(size = 7), legend.title = element_text(size = 9), plot.margin=margin(60,6,80,6) ) legend <- get_legend(p6) p7 <- p6 + theme(legend.position="none") p7`

xiangpin commented 2 years ago

Which version is ggtreeExtra on your platform?
The following is my platfrom.

> packageVersion('ggtreeExtra')
[1] ‘1.4.2’
>

How about updating it via BiocManager::install('ggtreeExtra')?

taekwonleo-yuhao commented 2 years ago

I updated ggtreeExtra to 1.4.2 version, but the problem is not fixed. Thanks.

xiangpin commented 2 years ago

How about removing the subset = !is.na(LDAmean) in display the LDA of significant OTU

taekwonleo-yuhao commented 2 years ago

Yes! Problem fixed after I remove subset = !is.na(LDAmean). I am so curious 🤨 why it will affect the wide of Bar. This line should only remove the LDAmean = NA samples.