FrederickHuangLin / ANCOMBC

Differential abundance (DA) and correlation analyses for microbial absolute abundance data
https://www.nature.com/articles/s41467-020-17041-7
107 stars 29 forks source link

Any non-NULL rand_formula appears to give erroneous results in ANCOMBC2. #111

Closed agrier-wcm closed 2 years ago

agrier-wcm commented 2 years ago

I noticed with my own data that if I try to include a random intercept for subject, rand_formula = "(1|Subject)", the res table in the output has all zeros in the lfc columns, a constant value around 0.5 in each of the se columns, W values of all zero, and p and q values of all one. Setting rand_formula = NULL gives normal looking results.

This same issue can be observed with the example code from the tutorial:

library(ANCOMBC)
data(dietswap)
tse = dietswap

output = ancombc2(data = tse, assay_name = "counts", tax_level = "Family",
                  fix_formula = "nationality + timepoint + group",
                  rand_formula = "(timepoint | subject)",
                  p_adj_method = "holm", pseudo = 0, pseudo_sens = TRUE,
                  prv_cut = 0.10, lib_cut = 1000, s0_perc = 0.05,
                  group = "group", struc_zero = TRUE, neg_lb = TRUE,
                  alpha = 0.05, n_cl = 2, verbose = TRUE,
                  global = TRUE, pairwise = TRUE, dunnet = TRUE, trend = TRUE,
                  iter_control = list(tol = 1e-2, max_iter = 20, 
                                      verbose = TRUE),
                  em_control = list(tol = 1e-5, max_iter = 100),
                  lme_control = lme4::lmerControl(),
                  mdfdr_control = list(fwer_ctrl_method = "holm", B = 100),
                  trend_control = list(contrast = list(matrix(c(1, 0, -1, 1),
                                                              nrow = 2, 
                                                              byrow = TRUE)),
                                       node = list(2),
                                       solver = "ECOS",
                                       B = 100))

output$res

Setting rand_formula = "(1 | subject)" gives the same issue as well. Setting it to NULL gives normal looking results.

As far as I can tell, including any random effects formula causes this issue.

This is with R version 4.2.2 and ANCOMBC version 1.6.4.

agrier-wcm commented 2 years ago

Out of curiosity I tried the regular ancom function with a rand_formula using both my own data and the data from the tutorial. The same issue exists; all W values are 0 or Inf, all beta values are 0, and all p-values are 1. Code:

library(ANCOMBC)
data(dietswap)
tse = dietswap

out = ancom(data = tse, assay_name = "counts", 
            tax_level = "Family", phyloseq = NULL, 
            p_adj_method = "holm", prv_cut = 0.10, lib_cut = 1000, 
            main_var = "group",
            adj_formula = "nationality + timepoint", 
            rand_formula = "~ timepoint | subject", 
            lme_control = list(maxIter = 100, msMaxIter = 100, opt = "optim"), 
            struc_zero = TRUE, neg_lb = TRUE, alpha = 0.05, n_cl = 2)

head(out$p_data)
head(out$beta_data)

out$res

Again, setting rand_formula = NULL gives normal looking results.

This is with lmerTest 3.1-3 and lme4 1.1-31.

FrederickHuangLin commented 2 years ago

Hi @agrier-wcm,

Thanks for reporting the BUG! I really appreciate it!

I figured out the problem with ancombc2 and will push a bug-fix commit by tomorrow. As for ancom, the issue is due to a typo in the tutorial. The correct syntax should be the following:

set.seed(123)
out = ancom(data = tse, assay_name = "counts", 
            tax_level = "Family", phyloseq = NULL, 
            p_adj_method = "holm", prv_cut = 0.10, lib_cut = 1000, 
            main_var = "group",
            adj_formula = "nationality + timepoint", 
            rand_formula = "(timepoint | subject)", 
            lme_control = lme4::lmerControl(), 
            struc_zero = TRUE, neg_lb = TRUE, alpha = 0.05, n_cl = 2)

res = out$res

Thanks again for your feedback! I will keep you posted when the changed is submitted.

Best, Huang

agrier-wcm commented 2 years ago

@FrederickHuangLin

Thanks for your reply. I look forward to the ancombc2 bug fix.

This may be a separate issue, but even with the correct rand_formula syntax, I couldn't get the ancom function to work with my data. I kept getting the following error message:

Error in { : task 1 failed - "subscript out of bounds"

I looked through the code a little bit, and I think the problem is somewhere in this else statement: https://github.com/FrederickHuangLin/ANCOMBC/blob/dbf4fb5c1434b551b087644836596a05f9660780/R/ancom.R#L314

I don't get an error using the corrected tutorial code you provided, but I believe the problem arises when the main_var factor has only two levels. For example, I do get this same error with the tutorial code and example dataset if I set main_var = "sex", like this:

library(ANCOMBC)
data(dietswap)
tse = dietswap

out = ancom(data = tse, assay_name = "counts", 
            tax_level = "Family", phyloseq = NULL, 
            p_adj_method = "holm", prv_cut = 0.10, lib_cut = 1000, 
            main_var = "sex",
            adj_formula = "nationality + timepoint", 
            rand_formula = "(timepoint | subject)",
            lme_control = lme4::lmerControl(),
            struc_zero = TRUE, neg_lb = TRUE, alpha = 0.05, n_cl = 2)

As far as I can tell, this error only occurs when there is a two level main variable and a random effects formula.

Thanks for all your work on this promising package.

FrederickHuangLin commented 2 years ago

Hi @agrier-wcm,

My apologies for the buggy codes. It seems I should have performed more careful unit tests before transferring the whole framework of ANCOMBC from phyloseq to tse.

I just pushed a bug-fix commit. I hope the new version (devel version 2.1.1, release version 2.0.1 on Bioconductor 3.16, should be available after 2 business days) of ancom and ancombc2 could solve the problem.

Let me know how it works!

Best, Huang

agrier-wcm commented 2 years ago

Thank you very much @FrederickHuangLin !

As far as I can tell, all of these issues are resolved in release version 2.0.1 on Bioconductor 3.16.