FrederickHuangLin / ANCOMBC

Differential abundance (DA) and correlation analyses for microbial absolute abundance data
https://www.nature.com/articles/s41467-020-17041-7
105 stars 28 forks source link

Difference between primary result and global test result #140

Closed sunsvet closed 1 year ago

sunsvet commented 1 year ago

Hello Frederick,

Thanks for the updated tutorials on ANCOMBC and ANCOMBC2. When looking at the updated tutorial for ANCOMBC (version 18/12/2022), I am struggling to understand what is the difference between the heat maps (and results in general) for log fold changes from the primary result (4.2) and the log fold changes from global test result (4.3). I am not sure why there are different findings from them. Initially, my understanding was that the global test result is applied when you have 2 or more groups for a specific variable and I thought that the primary result is just for 2 groups. However, now I think I'm wrong.

Another question I have is regarding the bias correction. How does ANCOM-BC incorporate sampling fraction (considering I haven't input the exact population (bur rather just my sample) and the sequencing efficiency? I have read the ANCOM-BC paper, however my background in stats is quite minimal, so I don't fully understand it.

I'm sorry for taking your time and thank you again for your continuous efforts to improve the package!

FrederickHuangLin commented 1 year ago

Hi @sunsvet,

These are great questions!

For your first question regarding the difference between ancombc primary results and global test results, consider we have three groups, A, B, and C, and our primary interest is the group effect.

For ancombc primary results, it focuses on the difference between the reference group (group A in this example) and other groups, i.e., B - A and C - A; however, for ancombc global test, it aims to identify taxa that are different in at least one group, so in this example, any taxa that are significantly differentially abundant across group A, B, or C, will be detected using the global test. Thus, the hypotheses for ancombc primary results and global test results are different, and they actually use different test statistics. Thinking about pairwise t-test and ANOVA, they should give you pretty consistent results but some differences should be expected since these two methods are using different statistics.

For your second question, ANCOM-BC estimates sampling fractions that are different across samples and then models the log of sample data through a linear regression model including the estimated sampling fraction as an offset term. This is essentially a normalization approach that attempts to recover the absolute abundances of taxa. ANCOM-BC does not account for the difference of sequencing efficiency, and this is one of the motivations why we developed ANCOM-BC2. We will submit the ANCOM-BC2 paper in March and there will be a more detailed discussion on sampling fractions (sample-specific biases) and sequencing efficiencies (feature-specific biases). Stay tuned!

Best, Frederick