Closed GMBog closed 2 years ago
Hello, thanks for your comment. It's a bit hard to diagnose the problem but your error message suggests that the error comes from the computation of Omega
in the C++ part of the code, suggesting that the variance matrix Sigma
is not full rank when you include too many fixed effects.
One way to fix the problem is to constrain Sigma
to be diagonal and/or spherical (rather than just symmetric positive definite) to make inversion easier. You can do so with the syntax:
my_pln <- PLN(Abundance ~ factor_1 + factor_2 + factor_3, data = my_data, control = list(covariance = "diagonal"))
This is a strong constraint but it's hard to advise anything else without looking at the data.
Hello, thank you for your prompt reply. I am working with an abundance matrix of 795 samples and 2059 OTUs. I have four different fixed effects such as year, sequencing run, lactation number and rumen sampling order.
The error message occurred when I used PLNLDA grouping by a different fixed effect not included in the model. I had no problems with PLN or PLNPCA.
I will try your suggestion to include a restriction for sigma.
Could you share your design matrix with fake count data (or just the design matrix and let me generate fake count data) ? I'm a bit puzzled as to why the optimizer works for PLN and PLNPCA but not for PLNLDA. It might a problem during the post-processing (which differs for the three methods).
For future references, the problem arises when there is a discrete factor among the fixed effect. The (silent) removal of the intercept term leads to binary coding of the first factor (from the fixed effect)
> data <- data.frame(grouping = c("A", "A", "B", "B"), fixef = c("a", "b", "a", "b"))
> model.matrix(~ 0 + fixef, data = data)
fixefa fixefb
1 1 0
2 0 1
3 1 0
4 0 1
which interferes with the later binary coding for the grouping factor as the full model matrix is not full rank anymore
> cbind(model.matrix(~ 0 + grouping, data = data),
+ model.matrix(~ 0 + fixef, data = data))
groupingA groupingB fixefa fixefb
1 1 0 1 0
2 1 0 0 1
3 0 1 1 0
4 0 1 0 1
This can be fixed either by:
1/ removing the first column from the covariates model matrix (only if xlevels
is not NULL)
2/ forcing an intercept in the formula when building the model matrix but removing it during model fitting
In both cases, some adaptations to the predict()
functions are required.
@jchiquet unless you have a different opinion, I find 2/ slightly cleaner and will fix it that way.
@jchiquet unless you have a different opinion, I find 2/ slightly cleaner and will fix it that way.
Sounds good, thank you @mahendra-mariadassou for handling this!
Thank you very much for your prompt response to my questions.
2/ forcing an intercept in the formula when building the model matrix but removing it during model fitting Do I need to modify the PLNLDA function, or can I do it directly in RStudio with the original function of the package?
2/ forcing an intercept in the formula when building the model matrix but removing it during model fitting Do I need to modify the PLNLDA function, or can I do it directly in RStudio with the original function of the package?
No, I need to fix the bug from our side. I should hopefully be done next week and will let know. You'll need to develop the development version of the package before the issue is fixed on CRAN.
Hi,
The problem should be solved in the current version of the package, which you install on your computer with:
remotes::install_github("pln-team/PLNmodels")
Let me know if you still encounter problems.
When I use the PLN function with a model with nested fixed effects, or with more than four fixed effects, I get the following error message:
warning: inv_sympd(): given matrix is not symmetric
error: inv_sympd(): matrix is singular or not positive definite Error in optimizer(list(Theta = private$Theta, M = private$M, S = sqrt(private$S2)), : inv_sympd(): matrix is singular or not positive definite
Which I believe is due to the fixed effects matrix being used. How is it possible to solve this?