Closed gabrielet closed 1 year ago
Hi @gabrielet,
I was wondering whether you have figured out the issue.
If not, could you share the warning messages and a reproducible coding example here?
Thank you, Huang
@FrederickHuangLin, i think I figured this out, actually. Not sure though and I was still running some code but it looks like the problem is in the model I am using:
fixedf <- "Treatment * TimePoint"
randf <- "1|CollectionSite"
apparently, using the 1|collection_site
is causing issues of statistical nature to the model which is unable to converge. if i use the model as:
fixedf <- "Treatment * TimePoint"
randf <- "CollectionSite" <- here i am not including the intercepts
then ANCOMBC works without any problem.
Hi @gabrielet,
It seems the rand_formula
appears to be incorrect.
ANCOM-BC2 follows the lmerTest
package in formulating the random effects, and the correct syntax would be rand_formula = "(1|CollectionSite)"
. Note that it is important to specify the parenthesis.
Best, Huang
hello @FrederickHuangLin, and thank you for your reply.
so, to clarify the question, i did a test. I run the same ancombc analysis with a different formulation for the random effects, see the randf
variable:
# set fixed effects
fixedf <- "Treatment * TimePoint"
# set random effects
randf <- "(CollectionSite)" # parenthesis
# run ancombc
abc_res <- ancombc2(data=phyloseq_obj, assay_name="counts", tax_level="Phylum" fix_formula=fixedf, rand_formula=randf, group="TimePoint", pairwise=T, p_adj_method="fdr", n_cl = 8, verbose=T)
then:
# set fixed effects
fixedf <- "Treatment * TimePoint"
# set random effects
randf <- "CollectionSite" # no parenthesis
# run ancombc
abc_res <- ancombc2(data=phyloseq_obj, assay_name="counts", tax_level="Phylum" fix_formula=fixedf, rand_formula=randf, group="TimePoint", pairwise=T, p_adj_method="fdr", n_cl = 8, verbose=T)
and:
# set fixed effects
fixedf <- "Treatment * TimePoint"
# set random effects
randf <- "(1|CollectionSite)" # collectionsite and intercept
# run ancombc
abc_res <- ancombc2(data=phyloseq_obj, assay_name="counts", tax_level="Phylum" fix_formula=fixedf, rand_formula=randf, group="TimePoint", pairwise=T, p_adj_method="fdr", n_cl = 8, verbose=T)
in the first two examples, i.e. randf <- (CollectionSite)
and randf <- CollectionSite
i end up with the same results (at least it looks like so) meaning that, as far as i can tell by reading the docs of lmer, the parenthesis are needed only when more terms are specified, and then they have to be separated by the pipe symbol, i.e. |
.
however, when i run with the intercept term, i.e. randf <- (1|CollectionSite)
, i end up with the mentioned error, meaning that if i want to add the intercepts the model breaks probably because of some statistical reasons which i can't figure out (maybe too little samples to run the model).
Hi @gabrielet,
Thanks for testing different scenarios and providing feedback.
You are probably right about the parenthesis, it might not be necessary, but I think vertical bars are needed for specifying the random effects, otherwise, the model will treat them as additional fixed effects. For instance, in your example, if you specify randf <- "CollectionSite"
, the lmer
function will treat CollectionSite
as another fixed effect which is probably not what you want.
I went through the user manual for lmer
function you provided. In the Argument section for formula
, they did not call out the usage of parenthesis, but they did emphasize the usage of vertical bars:
formula a two-sided linear formula object describing both the fixed-effects and random-effects part of the model, with the response on the left of a ~ operator and the terms, separated by + operators, on the right. Random-effects terms are distinguished by vertical bars (|) separating expressions for design matrices from grouping factors. Two vertical bars (||) can be used to specify multiple uncorrelated random effects for the same grouping variable. (Because of the way it is implemented, the ||-syntax works only for design matrices containing numeric (continuous) predictors; to fit models with independent categorical effects, see dummy or the lmer_alt function from the afex package.)
As for the error message you encountered, I agree with you, this is a statistical problem.
The error message you are seeing is related to the variance-covariance matrix of a linear mixed-effects model. The error message “not a positive definite matrix” indicates that the matrix is not positive definite, which means that some of the eigenvalues of your correlation matrix are not positive numbers. This error can occur when there is a problem with the data or model specification.
One possible solution to this problem is to check if there are any missing values in your data and remove them if necessary. Another solution is to check if there are any multicollinearity issues in your data. You can also try using different optimization algorithms or increasing the number of iterations.
I hope this helps! Let me know if you have any other questions.
Huang
@FrederickHuangLin i am unsure about the pipe symbol and whether i understood what you suggested. so, i tried to run the following:
fixedf <- "Treatment * TimePoint"
#fixedf <- "nifH+nosZ+nirS"
randf <- "|CollectionSite" # here i put the pipe before the random effects
# run ancombc
abc_res <-ancombc2(data=phylo_xna[[xna]], assay_name="counts", tax_level=taxa_level, fix_formula=fixedf, rand_formula=randf, group="TimePoint", pairwise=T, p_adj_method="fdr", n_cl = 8, verbose=T)
and i got an error stating that the pipe was not needed.
Error in str2lang(x) : <text>:1:28: unexpected '|'
1: y ~ Treatment * TimePoint+ |
hopefully passing the random effects without the pipe works as expected but i am unsure about how to actually verify it.
Hi @gabrielet,
I believe both of the following specifications, randf <- "(CollectionSite)"
and randf <- "CollectionSite"
will treat randf
as the fixed effect, not a random intercept. The correct way of specifying the random intercept is either randf <- "(1|CollectionSite)"
or randf <- "1|CollectionSite"
.
As you mentioned earlier, the correct specification led to an error of “not a positive definite matrix”, indicating that the matrix is not positive definite, which means that some of the eigenvalues of your correlation matrix are not positive numbers. This error can occur when there is a problem with the data or model specification.
I think this is more of a statistical issue with your data rather than a software issue. I suggest fitting the mixed effects model to your data using lmerTest (do not implement ancombc2
at this step) using the same fixed effects and random effects you used for the ancombc2
function, and see if you encounter the same error message. Usually, they will give you more informative error messages.
Best, Huang
I am having the warning reported in the question.
I found some info regarding
glmer
(one, two, and three) but I am unable to understand what is causing the issue.It might be a problem with a too low number of iterations, as suggested in two, however i am unsure regarding what to do with
ancombc2
parameters.Or, it can be a problem of collinearity of the covariates I am using (as suggested here) but I did many other tests using mixed-effect models and none of them had problems so I am unsure whether this may be the case (i don't think so).
If there are some suggestions, they are more than welcome.
UPDATE, a few days after posting.
i believe the problem is a statistical one, due to the nature of the data and the model i am using to test for differential abundance.
problem is that i end up with too few samples to perform the analysis on, and this may be the reason why i am getting the warning but i am not 100% sure about this. i am still investigating.