FrederickHuangLin / ANCOMBC

Differential abundance (DA) and correlation analyses for microbial absolute abundance data
https://www.nature.com/articles/s41467-020-17041-7
106 stars 29 forks source link

Inconsistent results between e.g. `res` and `res_global` #210

Open emnilsson opened 1 year ago

emnilsson commented 1 year ago

Hi there!

I like the functions from ANCOMBC2, such as performing multigroup-comparisons, but I might be doing something wrong. I've run 2.0.2 and it worked out well, but since I discovered that it was quite severely outdated I've now run 2.2.2. However, I'm a bit concerned about the output.

I have a dataset with ASVs that originate from three different locations that are sampled at 9 timepoints, at this stage I'm primarily interested in the differences between the groups, but I like to add the dates as a random factor. Therefore, I've run:

ancombc2_fam <- ancombc2(data = ps, tax_level = "Family", fix_formula = "site", rand_formula="(1|date)",
              p_adj_method = "holm", prv_cut = 0.10, lib_cut = 1000,
              group = "site", struc_zero = TRUE, neg_lb = TRUE, alpha = 0.05, n_cl = 2, verbose = TRUE,
              global=TRUE, pairwise=TRUE, dunnet=TRUE, trend=TRUE,
              iter_control = list(tol = 1e-5, max_iter = 100, verbose = TRUE),
              em_control = list(tol = 1e-5, max_iter = 100),
              lme_control = lme4::lmerControl(),
              mdfdr_control = list(fwer_ctrl_method = "holm", B = 100),
              trend_control = list(contrast = list(matrix(c(1, 0, -1, 1), nrow = 2, byrow = TRUE), 
                                                   matrix(c(-1, 0, 1, -1), nrow = 2, byrow = TRUE)),
                                       node = list(2, 2), solver = "ECOS", B = 10))

And it runs through. In the output I get significant different families both according to diff and passed_ss if I look under res, and also if I look at res_dunn and even for res_trend. But when I look under res_global all values for W are 0, p_val are NA, q_val are 1, and diff_abn are FALSE. I get some passed_ss, though, about 20%. This is similar for when looking at res_pair, but there I get different values for W, lfc, and se, just that nothing is significant (p_val are all 1).

I'm rerunning the analysis at the moment, so not sure if I'll get warnings or not, but when I ran more or less the same thing but at lowest level (tax_level=NULL, neg_lb=FALSE) I got the following warnings from warnings():

Warning messages:
1: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  ... :
  unable to evaluate scaled gradient
2: In checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv,  ... :
  Model failed to converge: degenerate  Hessian with 1 negative eigenvalues
3: In as_lmerModLT(model, devfun) :
  Model may not have converged with 1 eigenvalue close to zero: -1.1e-11
4: In optwrap(optimizer, devfun, x@theta, lower = x@lower,  ... :
  convergence code 3 from bobyqa: bobyqa -- a trust region step failed to reduce q

And the last one was repeated 46 times. (Running on the lowest level produced the same results/no-results as on family-level.)

Any idea what is going on? Or how I can troubleshoot? I would very much want to have the res_pair output.

Cheers, Emelie.

FrederickHuangLin commented 11 months ago

Hello @emnilsson,

I appreciate you flagging this issue for me.

The notable differences you've observed before and after the major update can be reviewed in this post.

It's possible that the primary factor contributing to the discrepancies in your results is related to how zeros are handled. I would suggest trying a manual addition of 1 to your data and then rerunning ancombc2 to determine if this step helps replicate your original results. If it does, it would also be helpful to know the sparsity of your data.

Please don't hesitate to reach out if you encounter any further issues.

Best regards, Huang

emnilsson commented 10 months ago

Hi Huang!

Thank you very much for your reply, and I'm sorry for getting back to you so late. I just tried rerunning by adding a 1 to all my counts, but this did not change the outcome for res_global and res_pair (but I do keep a lot more families, going from 488 to 1293). I did not change my code in any other way, but I did upgrade to v2.4.0. Could it have something to do with my formula? Or is it likely to do with my dataset?

Cheers, Emelie.