ChiLiubio / microeco

An R package for data analysis in microbial community ecology
GNU General Public License v3.0
203 stars 58 forks source link

Get -Inf LDA value when use lefse analysis #407

Closed monaye745 closed 1 month ago

monaye745 commented 1 month ago

Hi!

I have used the code below to analyze my metagenomic data: lefse <- trans_diff$new(dataset = dataset, method = "lefse", group = cond, alpha = 0.05, p_adjust_method = "none", lefse_subgroup = NULL)

But I found the LDA value is strange: image

As you can see, all LDA value is -Inf:

image

So what could be the reason for this result?

Thanks a lot!

ChiLiubio commented 1 month ago

Hi. Could you please attach your dataset so that I can reproduce your issue? To save the dataset, please follow the steps in the tutorial (https://chiliubio.github.io/microeco_tutorial/notes.html#save-function) and attach the compressed object. Please also provide your microeco version. Thanks.

monaye745 commented 1 month ago

sessionInfo() R version 4.2.0 (2022-04-22) Platform: x86_64-pc-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core)

Matrix products: default BLAS: /usr/local/lib64/R/lib/libRblas.so LAPACK: /usr/local/lib64/R/lib/libRlapack.so

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] glue_1.6.2 dplyr_1.0.9 ggplot2_3.5.1 microeco_1.7.0

loaded via a namespace (and not attached): [1] Rcpp_1.0.8.3 plyr_1.8.7 pillar_1.7.0 compiler_4.2.0
[5] RColorBrewer_1.1-3 tools_4.2.0 lifecycle_1.0.4 tibble_3.1.7
[9] nlme_3.1-157 gtable_0.3.0 lattice_0.20-45 mgcv_1.8-40
[13] pkgconfig_2.0.3 rlang_1.1.3 igraph_1.3.1 Matrix_1.4-1
[17] cli_3.6.2 DBI_1.1.2 parallel_4.2.0 withr_2.5.0
[21] stringr_1.4.0 cluster_2.1.3 generics_0.1.2 vctrs_0.6.5
[25] grid_4.2.0 tidyselect_1.1.2 data.table_1.14.2 R6_2.5.1
[29] fansi_1.0.3 reshape2_1.4.4 purrr_0.3.4 magrittr_2.0.3
[33] scales_1.3.0 ellipsis_0.3.2 MASS_7.3-57 splines_4.2.0
[37] assertthat_0.2.1 permute_0.9-7 ape_5.6-2 colorspace_2.0-3
[41] utf8_1.2.2 stringi_1.7.6 munsell_0.5.0 crayon_1.5.1
[45] vegan_2.6-2

monaye745 commented 1 month ago

dataset.zip

ChiLiubio commented 1 month ago

Hi. It is probably the reason that there is strong colinearity in LDA resulting from too many input features (total 35383). A temporary solution is to reduce the features with very low abundance.

library(microeco)
load("dataset.RData")
lefse <- trans_diff$new(dataset = dataset, method = "lefse", group = "Group", alpha = 0.05, p_adjust_method = "none", lefse_subgroup = NULL, filter_thres = 0.0001)

Another way is to remove those features at Species level.

library(microeco)
load("dataset.RData")

d1 <- clone(dataset)
d1$tax_table <- d1$tax_table[, 1:6]
d1$cal_abund()

lefse <- trans_diff$new(dataset = d1, method = "lefse", group = "Group", alpha = 0.05, p_adjust_method = "none", lefse_subgroup = NULL)
monaye745 commented 1 month ago

Thank you for your reply! It works for me!