ChiLiubio / microeco

An R package for data analysis in microbial community ecology
GNU General Public License v3.0
207 stars 59 forks source link

differential abundance analysis: $plot_diff_bar versus $plot_diff_abund #206

Closed juismo closed 8 months ago

juismo commented 1 year ago

Hi Chi,

thank you for the amazing package. I'm very interested in differential abundance analysis. I have tryed the LDA and the RF approach and generally it works well. Nevertheless, I have one question related to the LDA and MeanDecreaseGini scores. For some taxa the color of bars (in $plot_diff_bar) illustrates (similar to the "Group" column of the $res_diff table) that the respective taxa is enriched or more abundant in group A, let's say. But in the abundance plot ($plot_diff_abund) this taxa is more abundant in group B. I thought this is related to my data, but now I have seen it is the same in the tutorial (e.g. for g_Kroppenstedtia bar is orange = CW group in $plot_diff_bar, in contrast to $plot_diff_abund showing the highest abundance in TW = green bar). Am I wrong, is this a misinterpretation on my part?

Thank you for your help!

ChiLiubio commented 1 year ago

Hi. Thanks for your appreciation! Both the lefse and rf methods invoke a KW test before the following statistics to check the significance among groups. As KW is non-parametric, so the enriched group is presented by comparing medians instead of means. But in the abundance bar plot, mean values are shown. So sometimes we can feel strange, especially for some taxa that have similar mean and medians across groups. The mean and error bar are associated with the data distribution, but data rank no. So I guess it is the key point and have also found it before. If you feel it may be a bug, please provide me a full step with the script and data (https://chiliubio.github.io/microeco_tutorial/notes.html#save-function) so that I can reproduce your issue.

Chi