YuLab-SMU / MicrobiotaProcess

:microbe: A comprehensive R package for deep mining microbiome
https://www.sciencedirect.com/science/article/pii/S2666675823000164
182 stars 37 forks source link

Summarizing data from mpse object without changing class #91

Closed etorres475 closed 1 year ago

etorres475 commented 1 year ago

Hi Shuangbin,

In my experimental design, I sequenced some samples and sub-samples. I would like to summarize and calculate the mean abundances for those samples and sub-samples so they represent one single sample in the posterior analysis with MicrobiotaProcess. When I try to do it using dpyr on the mpse object, it is converted into a tibble data (tbl_mpse object) and then into a tbl_df. I tried using as.mpse on the resulting file but it ended in NULL (please see the screenshots attached). Is it possible to do this in some other way?

Cheers!

Screenshot 2023-04-22 at 5 16 11 PM Screenshot 2023-04-22 at 5 22 49 PM Screenshot 2023-04-22 at 5 23 31 PM
etorres475 commented 1 year ago

I am getting another error when using mp_cal_rarecurve in:

mpse12.a <- mpse12 %>% mp_rrarefy() %>% mp_cal_alpha(.abundance = RareAbundance) %>% mp_cal_rarecurve(.abundance = RareAbundance)

Screenshot 2023-04-22 at 7 19 11 PM

Data can be found at: https://www.dropbox.com/s/eeblzk2uqggyv6o/mpse12.obj?dl=0

etorres475 commented 1 year ago

Sorry! Another question, I am generating very nice plots of taxa abundance per group, but I would like to know if the differences I see are statistically significant and extract the data. How can I do it?

Screenshot 2023-04-22 at 8 08 29 PM
xiangpin commented 1 year ago

you can refer to the related issue and answer

xiangpin commented 1 year ago

In my experimental design, I sequenced some samples and sub-samples. I would like to summarize and calculate the mean abundances for those samples and sub-samples so they represent one single sample in the posterior analysis with MicrobiotaProcess. When I try to do it using dpyr on the mpse object, it is converted into a tibble data (tbl_mpse object) and then into a tbl_df. I tried using as.mpse on the resulting file but it ended in NULL (please see the screenshots attached). Is it possible to do this in some other way?

I think you can use mp_aggregate to do it.

library(MicrobiotaProcess)
data(mouse.time.mpse)
mouse.time.mpse %>% mp_aggregate(.group=time)
etorres475 commented 1 year ago

you can refer to the related issue and answer

Thanks a lot for your help. I want to know which Phyla are differentially expressed, not the phyla to which the biomarkers belong, so I need to specify that I want to calculate the mp_diff_analysis in the Phyla.

I am trying the following:

b.ptt.16 <- b.mpse16 %>% mp_rrarefy() %>% mp_diff_analysis(.abundance=RareAbundance, .group=Plants_grown_in_soil, p.adjust = "fdr",first.test.alpha = 0.05,filter.p = "pvalue", taxa.class = "Phylum") %>% mp_extract_taxatree()

But I get the following warning:

Screenshot 2023-04-23 at 1 51 00 PM

What am I doing wrong?

etorres475 commented 1 year ago

mouse.time.mpse %>% mp_aggregate(.group=time)

I got this:

Screenshot 2023-04-23 at 2 03 19 PM
xiangpin commented 1 year ago

First, you can extract the differential result (taxatree slot, a treedata object ).

b.mpse16 %>% mp_rrarefy() %>% mp_diff_analysis(.abundance=RareAbundance, .group=Plants_grown_in_soil, p.adjust = "fdr",first.test.alpha = 0.05,filter.p = "pvalue")  %>% mp_extract_taxatree() -> daa.td 

The daa.td is a treedata object which can be used ggtree, treeio, tidytree and ggtreeExtra directly. You can find nodeClass contains different taxonomy.

> daa.td %>% dplyr::pull(nodeClass) %>% unique
[1] "OTU"     "Root"    "Kingdom" "Phylum"  "Class"   "Order"   "Family"
[8] "Genus"   "Speies"

Then, you can use filter of dplyr to choose the specified taxa.

> daa.td %>% dplyr::filter(nodeClass == 'Phylum' & !is.na(Sign_Plants_grown_in_soil), keep.td=F)
# A tibble: 1 × 12
   node label   isTip nodeClass nodeDepth RareAbundanceBySample LDAupper LDAmean
  <int> <chr>   <lgl> <chr>         <dbl> <list>                   <dbl>   <dbl>
1  2584 p__Aci… FALSE Phylum            2 <tibble [45 × 5]>         4.19    4.15
# ℹ 4 more variables: LDAlower <dbl>, Sign_Plants_grown_in_soil <chr>,
#   pvalue <dbl>, fdr <dbl>                                                     

Please re-install MicrobiotaProcess by remotes::install_github(YuLab-SMU/MicrobiotaProcess).

etorres475 commented 1 year ago

It worked perfectly! Thanks a lot! :)

The only remaining error I keep getting is this:

I am getting another error when using mp_cal_rarecurve in:

mpse12.a <- mpse12 %>% mp_rrarefy() %>% mp_cal_alpha(.abundance = RareAbundance) %>% mp_cal_rarecurve(.abundance = RareAbundance)

Screenshot 2023-04-22 at 7 19 11 PM

Data can be found at: https://www.dropbox.com/s/eeblzk2uqggyv6o/mpse12.obj?dl=0

I checked and apparently, something is going on after filtering the features with low abundance (last step in the following command):

otuqzafile_12 <- "./its_12/table_its_12_filtered.qza" taxaqzafile_12 <- "./its_12/taxonomy-its-12.qza" mapfile_12 <- "./its_12/sample-metadata_its_12.txt" mpse12 <- mp_import_qiime2(otuqza=otuqzafile_12, taxaqza=taxaqzafile_12, mapfilename=mapfile_12)

taxa_12 <- taxonomy(mpse12) taxa_12[apply(taxa_12, 2, function(i)grepl("_un", i))] <- NA taxa_12 <- apply(taxa_12, 2, function(i)gsub("*.__", "", i)) taxonomy(mpse12) <- taxa_12 rownames(mpse12) <- paste0('ASV', seq_len(nrow(mpse12)))

mpse12 %>% mp_extract_feature(addtaxa=T) %>% dplyr::pull(Phylum) %>% unique() mpse12 <- mpse12 %>% dplyr::filter(!Phylum %in% c("pFungi_phy_Incertae_sedis", "p__un_kFungi"))

mpse12 <- mpse12 %>% mp_filter_taxa(.abundance=Abundance, min.abun=0, min.prop=.1)

If I skip this last step I can calculate my rarefaction curve.

Screenshot 2023-04-24 at 11 03 44 AM
etorres475 commented 1 year ago

Thanks a lot!