YuLab-SMU / MicrobiotaProcess

:microbe: A comprehensive R package for deep mining microbiome
https://www.sciencedirect.com/science/article/pii/S2666675823000164
182 stars 37 forks source link

rarefaction for first quartile? #76

Open marwa38 opened 1 year ago

marwa38 commented 1 year ago

I want to rarefy my samples to the first quartile. i don't understand how mp_cal_rarecurve() works with this?

mouse.time.mpse %<>% 
    mp_cal_rarecurve(
        .abundance = RareAbundance,
        chunks = 400
    )

I tried to do the rarefaction before converting my phyloseq to mpse but I could not go futher the downstream analysis from there as I get an error

summary(sample_sums(ps.prev))
ps.prev.rare <- rarefy_even_depth(ps.prev, sample.size = 3501, rngseed = 50)
  Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    641    3501   20304   52969   68311  321517 
`set.seed(50)` was used to initialize repeatable random subsampling.
Please record this for your records so others can reproduce.
Try `set.seed(50); .Random.seed` for the full vector
...
32 samples removedbecause they contained fewer reads than `sample.size`.
Up to first five removed samples are: 

M-T1-Rep4M-T5-Rep2M-T5-Rep3M-T5-Rep4M-T5-Rep6
...
2025OTUs were removed because they are no longer 
present in any sample after random subsampling
mpse.rare <- ps.prev.rare %>% as.MPSE()
mpse.rare %>% print(width=180)
# A MPSE-tibble (MPSE object) abstraction: 286,732 × 20
# OTU=2956 | Samples=97 | Assays=Abundance | Taxonomy=Kingdom, Phylum, Class, Order,
#   Family, Genus, Species
   OTU   Sample      Abundance sample.name sample.no Sample.x Phase Region Regime
   <chr> <chr>           <dbl> <chr>           <int> <fct>    <fct> <fct>  <fct> 
 1 ASV1  feed-M-Rep1         6 M1                170 feed     feed  feed   M     
 2 ASV3  feed-M-Rep1       424 M1                170 feed     feed  feed   M     
 3 ASV5  feed-M-Rep1         0 M1                170 feed     feed  feed   M     
 4 ASV6  feed-M-Rep1         0 M1                170 feed     feed  feed   M     
 5 ASV7  feed-M-Rep1         0 M1                170 feed     feed  feed   M     
 6 ASV8  feed-M-Rep1         0 M1                170 feed     feed  feed   M     
 7 ASV9  feed-M-Rep1         0 M1                170 feed     feed  feed   M     
 8 ASV10 feed-M-Rep1         0 M1                170 feed     feed  feed   M     
 9 ASV11 feed-M-Rep1         0 M1                170 feed     feed  feed   M     
10 ASV12 feed-M-Rep1         0 M1                170 feed     feed  feed   M     
   Sample_Regime sample_regime sample sample_or_control Kingdom    
   <fct>         <chr>         <chr>  <chr>             <chr>      
 1 feed.M        feed.M        feed   sample            k__Bacteria
 2 feed.M        feed.M        feed   sample            k__Bacteria
 3 feed.M        feed.M        feed   sample            k__Bacteria
 4 feed.M        feed.M        feed   sample            k__Bacteria
 5 feed.M        feed.M        feed   sample            k__Bacteria
 6 feed.M        feed.M        feed   sample            k__Bacteria
 7 feed.M        feed.M        feed   sample            k__Bacteria
 8 feed.M        feed.M        feed   sample            k__Bacteria
 9 feed.M        feed.M        feed   sample            k__Bacteria
10 feed.M        feed.M        feed   sample            k__Bacteria
   Phylum               Class                  Order               Family Genus Species
   <chr>                <chr>                  <chr>               <chr>  <chr> <chr>  
 1 p__Firmicutes        c__Clostridia          o__Oscillospirales  f__Ru… g__u… s__un_…
 2 p__Firmicutes        c__Bacilli             o__Lactobacillales  f__La… g__L… s__un_…
 3 p__Proteobacteria    c__Alphaproteobacteria o__Rhodobacterales  f__Rh… g__P… s__un_…
 4 p__Proteobacteria    c__Gammaproteobacteria o__Aeromonadales    f__Ae… g__A… s__un_…
 5 p__Bacteroidota      c__Bacteroidia         o__Cytophagales     f__Sp… g__A… s__un_…
 6 p__Proteobacteria    c__Gammaproteobacteria o__Burkholderiales  f__Co… g__P… s__un_…
 7 p__Verrucomicrobiota c__Verrucomicrobiae    o__Verrucomicrobia… f__Ru… g__L… s__un_…
 8 p__Proteobacteria    c__Alphaproteobacteria o__Rhodobacterales  f__Rh… g__R… s__un_…
 9 p__Proteobacteria    c__Gammaproteobacteria o__Enterobacterales f__En… g__E… s__un_…
10 p__Proteobacteria    c__Gammaproteobacteria o__Burkholderiales  f__Su… g__A… s__un_…
# … with 286,722 more rows
# ℹ Use `print(n = ...)` to see more rows

mpse.rare %>% 
      mp_plot_rarecurve(
         .rare = Abundance, 
         .alpha = Observe,
      )
Error in `dplyr::select()`:
! Can't subset columns that don't exist.
✖ Column `Abundance` doesn't exist.
Backtrace:
 1. mpse.rare %>% ...
 8. dplyr:::select.data.frame(., !!rlang::sym("Sample"), !!.rare)
 Error in dplyr::select(., !!rlang::sym("Sample"), !!.rare) :
marwa38 commented 1 year ago

update: I used mp_rrarefy(raresize = 3073) but still getting the error Error in if (obj %>% rowSums %>% var != 0 && !force) { : missing value where TRUE/FALSE needed

ps.prev <- readRDS("ps.prev.f3.rds")
ps.prev.intes <- subset_samples(ps.prev, Sample == "intestine")
ps.prev.intes <- prune_taxa(taxa_sums(ps.prev.intes) > 0, ps.prev.intes)

summary(sample_sums(ps.prev.intes))

mpse <- ps.prev.intes %>% as.MPSE() 
mpse %<>% mp_rrarefy(raresize = 3073)
mpse %<>% 
    mp_cal_rarecurve(
        .abundance = RareAbundance,
        chunks = 400
    )
Error in if (obj %>% rowSums %>% var != 0 && !force) { : 
missing value where TRUE/FALSE needed
mpse %>% print(width=180)
marwa38 commented 1 year ago

any advice on this please? @xiangpin

marwa38 commented 1 year ago

Please let me know of any more info required from me @xiangpin

xiangpin commented 1 year ago

Sorry, I can not generate the same error without your input. Would you mind seeding the input object to me? Or you can set force = TRUE in mp_cal_rarecurve since you rarefy your samples to the first quartile, which causes the sum reads of each sample still is unequal. You can refer to the following codes.

> data(mouse.time.mpse)
> mouse.time.mpse
# A MPSE-tibble (MPSE object) abstraction: 4,142 × 11
# OTU=218 | Samples=19 | Assays=Abundance | Taxonomy=Kingdom, Phylum, Class, Order, Family, Genus, Species
   OTU    Sample Abundance time  Kingdom Phylum Class Order Family Genus Species
   <chr>  <chr>      <int> <chr> <chr>   <chr>  <chr> <chr> <chr>  <chr> <chr>
 1 OTU_1  F3D0         579 Early k__Bac… p__Ba… c__B… o__B… f__Mu… g__u… s__un_…
 2 OTU_2  F3D0         345 Early k__Bac… p__Ba… c__B… o__B… f__Mu… g__u… s__un_…
 3 OTU_3  F3D0         449 Early k__Bac… p__Ba… c__B… o__B… f__Mu… g__u… s__un_…
 4 OTU_4  F3D0         430 Early k__Bac… p__Ba… c__B… o__B… f__Mu… g__u… s__un_…
 5 OTU_5  F3D0         154 Early k__Bac… p__Ba… c__B… o__B… f__Ba… g__B… s__un_…
 6 OTU_6  F3D0         470 Early k__Bac… p__Ba… c__B… o__B… f__Mu… g__u… s__un_…
 7 OTU_7  F3D0         282 Early k__Bac… p__Ba… c__B… o__B… f__Mu… g__u… s__un_…
 8 OTU_8  F3D0         184 Early k__Bac… p__Ba… c__B… o__B… f__Ri… g__A… s__un_…
 9 OTU_9  F3D0          45 Early k__Bac… p__Ba… c__B… o__B… f__Mu… g__u… s__un_…
10 OTU_10 F3D0         158 Early k__Bac… p__Ba… c__B… o__B… f__Mu… g__u… s__un_…
# … with 4,132 more rows
# ℹ Use `print(n = ...)` to see more rows
> mouse.time.mpse %>% mp_extract_assays(.abundance=Abundance) %>% colSums %>% quantile()
     0%     25%     50%     75%    100%
 2518.0  4048.0  5017.0  6603.5 16835.0
> mouse.time.mpse %>% mp_rrarefy(.abundance=Abundance, raresize=4048) %>% mp_cal_rarecurve(.abundance=RareAbundance, force=T)
# A MPSE-tibble (MPSE object) abstraction: 4,142 × 13
# OTU=218 | Samples=19 | Assays=Abundance, RareAbundance | Taxonomy=Kingdom, Phylum, Class, Order, Family, Genus, Species
   OTU   Sample Abund…¹ RareA…² time  RareAb…³ Kingdom Phylum Class Order Family
   <chr> <chr>    <int>   <int> <chr> <list>   <chr>   <chr>  <chr> <chr> <chr>
 1 OTU_1 F3D0       579     343 Early <tibble> k__Bac… p__Ba… c__B… o__B… f__Mu…
 2 OTU_2 F3D0       345     204 Early <tibble> k__Bac… p__Ba… c__B… o__B… f__Mu…
 3 OTU_3 F3D0       449     291 Early <tibble> k__Bac… p__Ba… c__B… o__B… f__Mu…
 4 OTU_4 F3D0       430     277 Early <tibble> k__Bac… p__Ba… c__B… o__B… f__Mu…
 5 OTU_5 F3D0       154      85 Early <tibble> k__Bac… p__Ba… c__B… o__B… f__Ba…
 6 OTU_6 F3D0       470     290 Early <tibble> k__Bac… p__Ba… c__B… o__B… f__Mu…
 7 OTU_7 F3D0       282     179 Early <tibble> k__Bac… p__Ba… c__B… o__B… f__Mu…
 8 OTU_8 F3D0       184     117 Early <tibble> k__Bac… p__Ba… c__B… o__B… f__Ri…
 9 OTU_9 F3D0        45      27 Early <tibble> k__Bac… p__Ba… c__B… o__B… f__Mu…
10 OTU_… F3D0       158     100 Early <tibble> k__Bac… p__Ba… c__B… o__B… f__Mu…
# … with 4,132 more rows, 2 more variables: Genus <chr>, Species <chr>, and
#   abbreviated variable names ¹​Abundance, ²​RareAbundance,
#   ³​RareAbundanceRarecurve
# ℹ Use `print(n = ...)` to see more rows.
marwa38 commented 1 year ago

sorry, I thought I attached it earlier Kindly find attached one of my phyloseq object ps.prev.rooted.zip