YuLab-SMU / MicrobiotaProcess

:microbe: A comprehensive R package for deep mining microbiome
https://www.sciencedirect.com/science/article/pii/S2666675823000164
182 stars 37 forks source link

Filtering data #71

Open DanielSoutoV opened 2 years ago

DanielSoutoV commented 2 years ago

Dear Xiang Pin,

An amazing package with really great visualization options! Thanks a lot! I am following the workshop with my own data, and I am able to do everything, but I was wondering if in the biomarker discovery section, if there is a way to further filter the 'deres' object previously created to include only a specific class/order. I am working with insects and I would like to have individual plots for e.g. Coleoptera and Lepidoptera, for instance.

I am sure there is an easy way to do this with dplyr filter function but am afraid am not totally sure how!

Thanks a lot for your support and the package, once again.

saludos

DanielSoutoV commented 2 years ago

Hi again,

I managed to filter by order but I did manually which was not a huge issue. I am trying now to rerun al analyses separated by order, but I am running into some problems.

For two orders, I am getting an error for diffclade:

Error in df[, match(c("y", "yend"), colnames(df))] : incorrect number of dimensions

which I suspect is because an issue with diff_analysis:

deresbees The original data: 137 features and 40 samples The sample data: 1 variables and 40 samples The taxda contained 69 by 8 rank after first test (kruskal_test) number of feature (pvalue<=0.05):34 after second test (wilcox_test and generalizedFC) number of significantly discriminative feature:34 after lda, Number of discriminative features: 0 (certain taxonomy classification:0; uncertain taxonomy classication: 0)

is the problem that the Number of discriminative features is 0? Apologies for the basic questions but any help will be very much appreciated!

thanks!

xiangpin commented 2 years ago

I am sorry for replying so late. But I can not get your issue, the MicrobiotaProcess was updated based on tidy framework, we now recommend using the newest version (v1.8.2). You can mp_diff_analysis to do this.

> library(MicrobiotaProcess)
> library(ggplot2)
> mouse.time.mpse %>% mp_rrarefy() %>% mp_diff_analysis(.abundance=RareAbundance, .group=time, first.test.alpha=.05, filter.p='fdr') %>% mp_plot_diff_cladogram()
> mpse.diff.res <- mouse.time.mpse %>% mp_rrarefy() %>% mp_diff_analysis(.abundance=RareAbundance, .group=time, first.test.alpha=.05, filter.p='fdr')
> mpse.diff.res %>% mp_extract_taxatree() -> taxatree.diffres
> taxatree.diffres
'treedata' S4 object'.

...@ phylo:

Phylogenetic tree with 218 tips and 186 internal nodes.

Tip labels:
  OTU_67, OTU_231, OTU_188, OTU_150, OTU_207, OTU_5, ...
Node labels:
  r__root, k__Bacteria, p__Actinobacteria, p__Bacteroidetes, p__Cyanobacteria,
p__Deinococcus-Thermus, ...

Rooted; no branch lengths.

with the following features available:
  'nodeClass', 'nodeDepth', 'RareAbundanceBySample', 'LDAupper', 'LDAmean',
'LDAlower', 'Sign_time', 'pvalue', 'fdr'.

# The associated data tibble abstraction: 404 × 12
# The 'node', 'label' and 'isTip' are from the phylo tree.
    node label   isTip nodeCl…¹ nodeD…² RareAb…³ LDAup…⁴ LDAmean LDAlo…⁵ Sign_…⁶
   <int> <chr>   <lgl> <chr>      <dbl> <list>     <dbl>   <dbl>   <dbl> <chr>
 1     1 OTU_67  TRUE  OTU            8 <tibble>    3.39    3.36    3.32 Late
 2     2 OTU_231 TRUE  OTU            8 <tibble>   NA      NA      NA    NA
 3     3 OTU_188 TRUE  OTU            8 <tibble>   NA      NA      NA    NA
 4     4 OTU_150 TRUE  OTU            8 <tibble>   NA      NA      NA    NA
 5     5 OTU_207 TRUE  OTU            8 <tibble>   NA      NA      NA    NA
 6     6 OTU_5   TRUE  OTU            8 <tibble>   NA      NA      NA    NA
 7     7 OTU_1   TRUE  OTU            8 <tibble>   NA      NA      NA    NA
 8     8 OTU_2   TRUE  OTU            8 <tibble>   NA      NA      NA    NA
 9     9 OTU_3   TRUE  OTU            8 <tibble>   NA      NA      NA    NA
10    10 OTU_4   TRUE  OTU            8 <tibble>    4.40    4.38    4.36 Late
# … with 394 more rows, 2 more variables: pvalue <dbl>, fdr <dbl>, and
#   abbreviated variable names ¹​nodeClass, ²​nodeDepth, ³​RareAbundanceBySample,
#   ⁴​LDAupper, ⁵​LDAlower, ⁶​Sign_time
# ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names

Then you can use dplyr::filter to filter the result

> taxatree.diffres %>% dplyr::filter(grepl('c__', label)) %>% mp_plot_diff_cladogram() + scale_fill_diff_cladogram(values=c('deepskyblue', 'orange'))

XX5