FrederickHuangLin / ANCOMBC

Differential abundance (DA) and correlation analyses for microbial absolute abundance data
https://www.nature.com/articles/s41467-020-17041-7
108 stars 29 forks source link

Warning messages: 1: In vcov.merMod(model) : Computed variance-covariance matrix problem: not a positive definite matrix; returning NA matrix #132

Closed gabrielet closed 1 year ago

gabrielet commented 1 year ago

I am having the warning reported in the question.

I found some info regarding glmer (one, two, and three) but I am unable to understand what is causing the issue.

It might be a problem with a too low number of iterations, as suggested in two, however i am unsure regarding what to do with ancombc2 parameters.

Or, it can be a problem of collinearity of the covariates I am using (as suggested here) but I did many other tests using mixed-effect models and none of them had problems so I am unsure whether this may be the case (i don't think so).

If there are some suggestions, they are more than welcome.

UPDATE, a few days after posting.

i believe the problem is a statistical one, due to the nature of the data and the model i am using to test for differential abundance.

problem is that i end up with too few samples to perform the analysis on, and this may be the reason why i am getting the warning but i am not 100% sure about this. i am still investigating.

FrederickHuangLin commented 1 year ago

Hi @gabrielet,

I was wondering whether you have figured out the issue.

If not, could you share the warning messages and a reproducible coding example here?

Thank you, Huang

gabrielet commented 1 year ago

@FrederickHuangLin, i think I figured this out, actually. Not sure though and I was still running some code but it looks like the problem is in the model I am using:

fixedf <- "Treatment * TimePoint"
randf <- "1|CollectionSite"

apparently, using the 1|collection_site is causing issues of statistical nature to the model which is unable to converge. if i use the model as:

fixedf <- "Treatment * TimePoint"
randf <- "CollectionSite"       <- here i am not including the intercepts

then ANCOMBC works without any problem.

FrederickHuangLin commented 1 year ago

Hi @gabrielet,

It seems the rand_formula appears to be incorrect.

ANCOM-BC2 follows the lmerTest package in formulating the random effects, and the correct syntax would be rand_formula = "(1|CollectionSite)". Note that it is important to specify the parenthesis.

Best, Huang

gabrielet commented 1 year ago

hello @FrederickHuangLin, and thank you for your reply.

so, to clarify the question, i did a test. I run the same ancombc analysis with a different formulation for the random effects, see the randf variable:

# set fixed effects
fixedf <- "Treatment * TimePoint"
# set random effects
randf <- "(CollectionSite)" # parenthesis

# run ancombc
abc_res <- ancombc2(data=phyloseq_obj, assay_name="counts", tax_level="Phylum" fix_formula=fixedf, rand_formula=randf, group="TimePoint", pairwise=T, p_adj_method="fdr", n_cl = 8, verbose=T)

then:

# set fixed effects
fixedf <- "Treatment * TimePoint"
# set random effects
randf <- "CollectionSite" # no parenthesis

# run ancombc
abc_res <- ancombc2(data=phyloseq_obj, assay_name="counts", tax_level="Phylum" fix_formula=fixedf, rand_formula=randf, group="TimePoint", pairwise=T, p_adj_method="fdr", n_cl = 8, verbose=T)

and:

# set fixed effects
fixedf <- "Treatment * TimePoint"
# set random effects
randf <- "(1|CollectionSite)" # collectionsite and intercept

# run ancombc
abc_res <- ancombc2(data=phyloseq_obj, assay_name="counts", tax_level="Phylum" fix_formula=fixedf, rand_formula=randf, group="TimePoint", pairwise=T, p_adj_method="fdr", n_cl = 8, verbose=T)

in the first two examples, i.e. randf <- (CollectionSite) and randf <- CollectionSite i end up with the same results (at least it looks like so) meaning that, as far as i can tell by reading the docs of lmer, the parenthesis are needed only when more terms are specified, and then they have to be separated by the pipe symbol, i.e. |.

however, when i run with the intercept term, i.e. randf <- (1|CollectionSite), i end up with the mentioned error, meaning that if i want to add the intercepts the model breaks probably because of some statistical reasons which i can't figure out (maybe too little samples to run the model).

FrederickHuangLin commented 1 year ago

Hi @gabrielet,

Thanks for testing different scenarios and providing feedback.

You are probably right about the parenthesis, it might not be necessary, but I think vertical bars are needed for specifying the random effects, otherwise, the model will treat them as additional fixed effects. For instance, in your example, if you specify randf <- "CollectionSite", the lmer function will treat CollectionSite as another fixed effect which is probably not what you want.

I went through the user manual for lmer function you provided. In the Argument section for formula, they did not call out the usage of parenthesis, but they did emphasize the usage of vertical bars:

formula a two-sided linear formula object describing both the fixed-effects and random-effects part of the model, with the response on the left of a ~ operator and the terms, separated by + operators, on the right. Random-effects terms are distinguished by vertical bars (|) separating expressions for design matrices from grouping factors. Two vertical bars (||) can be used to specify multiple uncorrelated random effects for the same grouping variable. (Because of the way it is implemented, the ||-syntax works only for design matrices containing numeric (continuous) predictors; to fit models with independent categorical effects, see dummy or the lmer_alt function from the afex package.)

As for the error message you encountered, I agree with you, this is a statistical problem.

The error message you are seeing is related to the variance-covariance matrix of a linear mixed-effects model. The error message “not a positive definite matrix” indicates that the matrix is not positive definite, which means that some of the eigenvalues of your correlation matrix are not positive numbers. This error can occur when there is a problem with the data or model specification.

One possible solution to this problem is to check if there are any missing values in your data and remove them if necessary. Another solution is to check if there are any multicollinearity issues in your data. You can also try using different optimization algorithms or increasing the number of iterations.

I hope this helps! Let me know if you have any other questions.

Huang

gabrielet commented 1 year ago

@FrederickHuangLin i am unsure about the pipe symbol and whether i understood what you suggested. so, i tried to run the following:

fixedf <- "Treatment * TimePoint"
#fixedf <- "nifH+nosZ+nirS"
randf <- "|CollectionSite" # here i put the pipe before the random effects

# run ancombc
abc_res <-ancombc2(data=phylo_xna[[xna]], assay_name="counts", tax_level=taxa_level, fix_formula=fixedf, rand_formula=randf, group="TimePoint", pairwise=T, p_adj_method="fdr", n_cl = 8, verbose=T)

and i got an error stating that the pipe was not needed.

Error in str2lang(x) : <text>:1:28: unexpected '|'
1: y ~ Treatment * TimePoint+ |

hopefully passing the random effects without the pipe works as expected but i am unsure about how to actually verify it.

FrederickHuangLin commented 1 year ago

Hi @gabrielet,

I believe both of the following specifications, randf <- "(CollectionSite)" and randf <- "CollectionSite" will treat randf as the fixed effect, not a random intercept. The correct way of specifying the random intercept is either randf <- "(1|CollectionSite)" or randf <- "1|CollectionSite".

As you mentioned earlier, the correct specification led to an error of “not a positive definite matrix”, indicating that the matrix is not positive definite, which means that some of the eigenvalues of your correlation matrix are not positive numbers. This error can occur when there is a problem with the data or model specification.

I think this is more of a statistical issue with your data rather than a software issue. I suggest fitting the mixed effects model to your data using lmerTest (do not implement ancombc2 at this step) using the same fixed effects and random effects you used for the ancombc2 function, and see if you encounter the same error message. Usually, they will give you more informative error messages.

Best, Huang

gabrielet commented 1 year ago

@FrederickHuangLin, sorry for the late reply. ok, i will go through this again and see what happens. will update the post as soon as possible!

edit: running ancombc2 with the wrong formula may cause another error which i posted in this thread.