FrederickHuangLin / ANCOMBC

Differential abundance (DA) and correlation analyses for microbial absolute abundance data
https://www.nature.com/articles/s41467-020-17041-7
103 stars 26 forks source link

ANCOM-BC / missing comparisons #64

Closed JeremyTournayre closed 2 years ago

JeremyTournayre commented 2 years ago

Hello,

Thanks to have published the ANCOM-BC tool.

I wonder why ANCOM-BC doesn’t do all possible pairwise comparisons?

For example for the variables “nationality” and “BMI” in your Bioconductor webpage there is only any nationality versus “CE” and any BMI versus “Lean” (see screenshot below)? I think the choice of the versus group “CE” or “Lean” is selected alphabetically, right?

image

Cheers, Jérémy Tournayre

FrederickHuangLin commented 2 years ago

Hi @JeremyTournayre,

Thanks for your question!

Yes, you are right! In the current version of ANCOM-BC, we only compare groups with their "reference group". By default, the reference group is the first one in alphabetic order. You can change the reference group using relevel function in R.

I am aware of the need to conduct pairwise comparisons for all groups and actually am updating the ANCOM-BC function now. We expect to submit the paper in a few months.

Best, Huang

mrml500 commented 2 years ago

Hi Huang,

Many thanks for developing this package.

I am also using ANCOM-BC to look at multiple pairwise comparisons. If I re-run the function after changing the reference group (using relevel), is there a basic approach for correcting for the resulting multiple comparisons (e.g. p-value correction)? I'm aware that you are updating the function, but just wondering if there was a more basic approach that I could apply now.

All the best, Mike

LLansing commented 2 years ago

Could you please provide an example on how/where to use relevel()? e.g. on category in the phyloseq object or elsewhere?

mrml500 commented 2 years ago

This worked for me. The first command is just to make sure your categories are factors before using relevel():

sample_data(phyloseq_object)$group <- as.factor(sample_data(phyloseq_object)$group) sample_data(phyloseq_object)$group <- relevel(sample_data(phyloseq_object)$group, "groupB")

FrederickHuangLin commented 2 years ago

Hi Huang,

Many thanks for developing this package.

I am also using ANCOM-BC to look at multiple pairwise comparisons. If I re-run the function after changing the reference group (using relevel), is there a basic approach for correcting for the resulting multiple comparisons (e.g. p-value correction)? I'm aware that you are updating the function, but just wondering if there was a more basic approach that I could apply now.

All the best, Mike

Hi @mrml500,

Currently, the multiple comparison correction is taking place across taxa. For example, we have 100 taxa and two groups, A and B, if we use Bonferonni correction, then we set the alpha value for each comparison equal to alpha/100 to control the family-wise error rate (FWER) at alpha.

However, if we have three groups, A, B, and C. In the current ancombc function, we still set the alpha value for each comparison (B vs. A or C vs. A if setting A as the reference group) equal to alpha/100. Clearly, it is not sufficient to control FWER as we need to set the alpha value as alpha/200 to account for the additional comparison.

Similar issues apply to all other p-value correction methods as the multiple comparison correction is taking place only across taxa, not across different comparisons in the current ancombc function.

Back to your question, there is no simple way to correct for the resulting multiple comparisons at this moment, but I would recommend using holm method as it is more conservative than BH.

We are incorporating mixed differential false discovery rate (mdFDR) control into ancombc to solve this problem. The new update is likely to be pushed by next month.

Best, Huang

JeremyTournayre commented 1 year ago

Hello,

Thanks for all of your answers.

Like nrml500, I use relevel to change the reference group to do all comparisons 2 by 2 but maybe there is another solution since updates?

Best, Jérémy Tournayre

FrederickHuangLin commented 1 year ago

Hi @JeremyTournayre,

Yes, the new ancombc2 function has the pairwise comparison option (set pair = TRUE). Feel free to try it out!

Best, Huang

LLansing commented 1 year ago

@FrederickHuangLin, I have two questions about the newly implemented ancombc2. I have 2 factors I want to consider in my analysis: treatment (main factor) and replicate (random effect). The formula I used previously with ancombc was formula = treatment+replicate.

The treatment has 3 groups (d0, d1, d2), and I only want comparisons of d0 v. d1 and d0 v. d2. For this reason I don't believe I need to use ancombc2's new pairwise option. As I understand it, ancombc did not correctly adjust the p-value when there is more than 1 contrast, as in my situation.

My other question is about random effects. I see ancombc2 has a new option rand_formula.

Additionally, if you know of any resources I could use for better understanding how to construct regression formulas and their syntax, I would much appreciate it.


EDIT: I discovered the dunnet option, which may be the solution to my first question above. However, I'm not sure I understand the difference between using dunnet for comparisons of each treatment group to the control (i.e. d0 vs d1, d0 vs d2) and what ancombc2 is doing by default.

Secondly, I've tested out the rand_formula with the factor replicate from by above described example:

ancom_result <- ancombc2(data = phylo_obj,
                             fix_formula = 'treatment+replicate',
                             rand_formula = 'replicate',
                             p_adj_method = "BH",
                             tax_level = taxa_lvl_key[rank_symbol],
                             group = "treatment",
                             dunnet = TRUE)

and the results seem strange. It has caused the results to have every single lfc to equal 0 and all p and q values to equal 1 for all taxa. I also tried the same ancombc2 call as above but without 'replicate' in the fix_formula, but the results are the same. Am I using rand_formula incorrectly?

FrederickHuangLin commented 1 year ago

Hi @LLansing,

Thanks for your questions! Please find below my responses:

Does ancombc2 correctly adjust p-values for multiple contrasts? Or does it only do so when the pairwise option is used?

Yes, ANCOM-BC2 now implements mdFDR correction, which not only corrects multiple comparisons across taxa but also across multiple contrasts. In your case, the treatment has 3 groups (d0, d1, d2). If you only want comparisons of d0 vs. d1 and d0 vs. d2, you can set dunnet = TRUE, which will enable Dunnett’s type of test (treatments vs. control). If you are interested in comparing d0 vs. d1, d0 vs. d2, and d1 vs. d2, you can set pairwise = TRUE.

Would I remove replicate from fixed_formula and use it in rand_formula? Or would I otherwise keep it in both the fixed and random formulas, as seen in your ANCOM-BC2 vignette with the factor timepoint in section 5.2

It depends. The example shown in the vignette is a random slope model, where the timepoint in fixed effects represents an overall slope, while the timepoint in random effects adds a subject-specific slope. If a random intercept model is of your interest, you usually only need to specify the random effect in random formula. ANCOM-BC2 follows the lmerTest package in formulating the random effects. For more details, I recommend reading their paper.

I think the correct way of specifying fixed and random effects in your case is

fix_formula = 'treatment',
rand_formula = '(1 | replicate)',

Again, I would highly recommend checking lmerTest user manual or their paper to double check if the formulation represents exactly the same as your mathematical equation.

Hope it helps!

Cheers, Huang

LLansing commented 1 year ago

@FrederickHuangLin I've tried the random formula specification you've described here but I'm still getting results of all 0's for log-fold change and all 1's for p-values and q-values (and all 0's for W values, and all identical se_ values) for every taxa.

Is there any way this can be a legitimate result? It seems like something is going wrong but I'm not familiar enough with the underlying principles to recognize what could cause this.

FrederickHuangLin commented 1 year ago

Hi @LLansing,

Thanks for your feedback! Which version of ANCOMBC are you using? There is a known issue of specifying random effects in ANCOMBC v2.0.0. Can you check if your issue is the same as stated in this post?

If so, ANCOMBC v2.0.1 should address this problem.

Best, Huang

LLansing commented 1 year ago

@FrederickHuangLin, my problem was the same described in the issue you linked.

Which version of ANCOMBC are you using?

I had an outdated version. Updating the version seems to have fixed that issue. Thanks!


I am, however, seeing something strange with my Dunnet results. When I compare the output$res with output$res_dunn, they have identical lfc, se, and W values, but the p_values of the Dunnet results are all 1. I had thought that Dunnet testing would change the way P-values are adjusted for multiple comparisons, so I anticipated a difference in the q-value. I didn't anticipate a difference in P-value. Am I misunderstanding something or is it possible these p-values could indicate a bug?

FrederickHuangLin commented 1 year ago

Hi @LLansing,

The reason why you saw changes in p-values as well is that ANCOM-BC2 adopts mdFDR correction for multiple comparisons.

You can find the general procedure in the paper, but I also copy it here:

  1. Use any global testing method to obtain the screening p -values. Apply BH procedure to identify genes that are differentially expressed in at least one pairwise comparison. Let R denote the number of genes so discovered.
  2. For each gene discovered in step 1, use any two-sample testing method to obtain the p-values for each pairwise comparison and apply any mixed directional family wise error (mdFWER) controlling procedure, such as Holm, Hochberg, etc., to the pairwise p -values at level Rα/m.
  3. For a given gene discovered in step 1, if a pairwise hypothesis is rejected in step 2, then we declare the gene to be up or down regulated in the pairwise comparison according to the sign of the corresponding test statistic.

From step 1, only genes (in microbiome studies, they are taxa) that pass the global test will survive and be further corrected. Therefore, in ANCOM-BC2 Dunnett’s type of test or pairwise test, if a taxon is not globally differentially abundant (after correcting by BH procedure), p-values and q-values are automatically set to 1s, meaning that these taxa are not further analyzed.

You can also verify it by setting global = TRUE and p_adj_method = "BH". The significant taxa in res_dunn will be a subset of the significant taxa in res_global.

Best, Huang

LLansing commented 1 year ago

@FrederickHuangLin I'm trying to create the formula string for my random effects so that the random effects have random-slopes. The syntax I found for this is the following, where I include the fixed effect on the left side of the | and Lab and replicate are my 2 random effects:

'(1 + treatment| replicate) + (1 + treatment| Lab)'

but I get the all 0s and 1s result that was patched for a different circumstance in v2.0.1

Is my string formulated incorrectly or is the problem related to the bug that was patched for other random effects in v2.0.1?

P.S. I get seemingly legitimate results with the random effect formula string '(1 | replicate) + (1 | Lab)', although with this rand_formula I get the following warning:

2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model failed to converge with max|grad| = 0.007472 (tol = 0.002, component 1) 3: Model failed to converge with 1 negative eigenvalue: -2.8e+00

Can I trust my results? How may I interpret this warning?

FrederickHuangLin commented 1 year ago

Hi @LLansing,

According to lmerTest paper, if you would like to introduce a random slope of Lab, the formula you are looking for is probably

(1 + Lab | replicate)

Please see section "4.2. The carrots data" and section "8.3 The summary method for objects returned by lmer" of lmerTest paper for reference.

Best, Huang

muhamanz commented 6 months ago

@FrederickHuangLin I am running ancombc2 with my phyloseq object using the below code

ancom_result <- ancombc2(data = phyloseq_object, fix_formula = 'AGE + GENDER + Group', rand_formula = 'Group', p_adj_method = "BH", tax_level = "Family", group = "Group", dunnet = TRUE)

In my group, I have two variables, and this is now adjusted for covariates age and gender. I run the code before anconbc: sample_data(phyloseq_object)$Group <- as.factor(sample_data(phyloseq_object)$Group) I hot got an error

Error in model.matrix.formula(formula(paste0("~", fix_formula)), data = meta_data) : data must be a data.frame.

Can you please tell me the issue? version 2.1.1

KaushJay commented 5 months ago

Hi @FrederickHuangLin , I ran the below code for differential abundance analysis, and I obtained values for all parameters except for the q intercept and q condition, which are consistently equal to 1 for all taxa. Can you please help me identify the issue?

output = ancombc2(data = pseq, assay_name = "counts", tax_level = NULL, fix_formula = "Condition", rand_formula = "(1 | Cage)", p_adj_method = "BH", pseudo = 0, pseudo_sens = TRUE, prv_cut = 0.10, lib_cut = 1000, s0_perc = 0.05, group = "Condition", struc_zero = FALSE, neg_lb = FALSE, alpha = 0.05, n_cl = 2, verbose = TRUE, global = TRUE, pairwise = TRUE, dunnet = FALSE, trend = FALSE, iter_control = list(tol = 1e-5, max_iter = 20, verbose = FALSE), em_control = list(tol = 1e-5, max_iter = 100), lme_control = lme4::lmerControl(), mdfdr_control = NULL, trend_control = NULL) res_prim = output$res

tanduc307 commented 3 months ago

@muhamanz I check the function ancombc2, and find the problem in the line: x = model.matrix(formula(paste0("~", fix_formula)), data = meta_data)

You could change: data = as.data.frame(meta_data) to get the job done.

To directly fix the function in session, use: trace(ancombc2, edit = TRUE)