ChiLiubio / microeco

An R package for data analysis in microbial community ecology
GNU General Public License v3.0
195 stars 56 forks source link

Error when run lefse analysis #303

Closed LAmethyst closed 6 months ago

LAmethyst commented 9 months ago

i run the lefse analysis to discover KEGG ko biomarker among 3 groups. Below is the script. set.seed(20231231) ko_group <- trans_diff$new( dataset = ko_dat, method = "lefse", group = "Group", alpha = 0.01, p_adjust_method = "fdr", lefse_norm = 1000000, nresam = 0.6667, boots = 100, ) And encountered the error as shown below: No taxa_abund list found. Calculate it with cal_abund function ... The result is stored in object$taxa_abund ... 9940 input features ... 9940 features are remained after removing unknown features ... Start Kruskal-Wallis rank sum test for Group ... 1926 taxa found significant ... After P value adjustment, 459 taxa found significant ... Error in max(.) : (converted from warning) no non-missing arguments to max; returning -Inf Is anything i can do to solve the problem? Many thanks if anyone can help.

ChiLiubio commented 9 months ago

Hi. Could you please attach your data ko_dat here so that I can reproduce your issue? To save the data, please follow the steps in the tutorial (https://chiliubio.github.io/microeco_tutorial/notes.html#save-function) and attach the compressed object. Thanks.

Best, Chi

LAmethyst commented 9 months ago

Hi Chi,

Many thanks for your kind reply.
i set options(warn = 0) and tried it again in R studio. The function seemed to run normally and returned the result. And I got warning about collinear.
Attachment is the source data ko_dat.RData.

I encountered another problem running another dataset ARO_dat.RData. Besides warnings about  collinear,  pgpB was identified to be enriched in FE group, but the boxplot of relative abundance did not seem so. Could you help me out if possible? ARO_dat.RData is attached also. Below is the script.

1.LEfSe

set.seed(20231231) ARO_group <- trans_diff$new( dataset = ARO_dat, method = "lefse", group = "Group", alpha = 0.01, p_adjust_method = "fdr", lefse_norm = 1000000, nresam = 0.6667, boots = 100, ) write.csv(ARO_group$res_diff,"./LEfSe/ARO_group.diff.csv",quote = FALSE,row.names = FALSE)

2.boxplot

ARO_group$plot_diff_abund( select_taxa =ARO_group$plot_diff_bar_taxa, # 跟LDA得分图保持一致。 width = 0.5, add_sig = TRUE, add_sig_label = "Significance", # 可以换成res_diff表中的其它列名。 coord_flip = TRUE # 垂直排列 ) Best, Qiong

在 2024-01-02 12:13:00,"Chi Liu" @.***> 写道:

Hi. Could you please attach your data ko_dat here so that I can reproduce your issue? To save the data, please follow the steps in the tutorial (https://chiliubio.github.io/microeco_tutorial/notes.html#save-function) and attach the compressed object. Thanks.

Best, Chi

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>

ChiLiubio commented 9 months ago

Hi. The attachment is not found. Please attach it in the github issue page or send it to me personally via email (liuchi0426@126.com).

ChiLiubio commented 9 months ago

Hi. The pgpB issue comes from the comparison of the values in three groups. lefse output the median among groups, not the mean. But Stable and FE groups have the same median, i.e. 0. Thus the function can not exactly get which is larger and generate a weird result. The barplot show the mean and sd/se, and they looks different. I will try to fix this. Thanks.

ARO_group$abund_table %>% .["pgpB", grepl("FE", colnames(.))] %>% unlist
ARO_group$abund_table %>% .["pgpB", grepl("Stable", colnames(.))] %>% unlist %>% median