PLN-team / PLNmodels

A collection of Poisson lognormal models for multivariate count data analysis
https://pln-team.github.io/PLNmodels
GNU General Public License v3.0
54 stars 18 forks source link

Singular matrix or not positive definite #89

Closed GMBog closed 2 years ago

GMBog commented 2 years ago

When I use the PLN function with a model with nested fixed effects, or with more than four fixed effects, I get the following error message:

warning: inv_sympd(): given matrix is not symmetric

error: inv_sympd(): matrix is singular or not positive definite Error in optimizer(list(Theta = private$Theta, M = private$M, S = sqrt(private$S2)), : inv_sympd(): matrix is singular or not positive definite

Which I believe is due to the fixed effects matrix being used. How is it possible to solve this?

mahendra-mariadassou commented 2 years ago

Hello, thanks for your comment. It's a bit hard to diagnose the problem but your error message suggests that the error comes from the computation of Omega in the C++ part of the code, suggesting that the variance matrix Sigma is not full rank when you include too many fixed effects.

One way to fix the problem is to constrain Sigma to be diagonal and/or spherical (rather than just symmetric positive definite) to make inversion easier. You can do so with the syntax:

my_pln <- PLN(Abundance ~ factor_1 + factor_2 + factor_3, data = my_data, control = list(covariance = "diagonal"))

This is a strong constraint but it's hard to advise anything else without looking at the data.

GMBog commented 2 years ago

Hello, thank you for your prompt reply. I am working with an abundance matrix of 795 samples and 2059 OTUs. I have four different fixed effects such as year, sequencing run, lactation number and rumen sampling order.

The error message occurred when I used PLNLDA grouping by a different fixed effect not included in the model. I had no problems with PLN or PLNPCA.

I will try your suggestion to include a restriction for sigma.

mahendra-mariadassou commented 2 years ago

Could you share your design matrix with fake count data (or just the design matrix and let me generate fake count data) ? I'm a bit puzzled as to why the optimizer works for PLN and PLNPCA but not for PLNLDA. It might a problem during the post-processing (which differs for the three methods).

mahendra-mariadassou commented 2 years ago

For future references, the problem arises when there is a discrete factor among the fixed effect. The (silent) removal of the intercept term leads to binary coding of the first factor (from the fixed effect)

> data <- data.frame(grouping = c("A", "A", "B", "B"), fixef = c("a", "b", "a", "b"))
> model.matrix(~ 0 + fixef, data = data)
  fixefa fixefb
1      1      0
2      0      1
3      1      0
4      0      1

which interferes with the later binary coding for the grouping factor as the full model matrix is not full rank anymore

> cbind(model.matrix(~ 0 + grouping, data = data), 
+       model.matrix(~ 0 + fixef, data = data))
  groupingA groupingB fixefa fixefb
1         1         0      1      0
2         1         0      0      1
3         0         1      1      0
4         0         1      0      1

This can be fixed either by: 1/ removing the first column from the covariates model matrix (only if xlevels is not NULL) 2/ forcing an intercept in the formula when building the model matrix but removing it during model fitting In both cases, some adaptations to the predict() functions are required.

@jchiquet unless you have a different opinion, I find 2/ slightly cleaner and will fix it that way.

jchiquet commented 2 years ago

@jchiquet unless you have a different opinion, I find 2/ slightly cleaner and will fix it that way.

Sounds good, thank you @mahendra-mariadassou for handling this!

GMBog commented 2 years ago

Thank you very much for your prompt response to my questions.

2/ forcing an intercept in the formula when building the model matrix but removing it during model fitting Do I need to modify the PLNLDA function, or can I do it directly in RStudio with the original function of the package?

mahendra-mariadassou commented 2 years ago

2/ forcing an intercept in the formula when building the model matrix but removing it during model fitting Do I need to modify the PLNLDA function, or can I do it directly in RStudio with the original function of the package?

No, I need to fix the bug from our side. I should hopefully be done next week and will let know. You'll need to develop the development version of the package before the issue is fixed on CRAN.

mahendra-mariadassou commented 2 years ago

Hi,

The problem should be solved in the current version of the package, which you install on your computer with:

remotes::install_github("pln-team/PLNmodels")

Let me know if you still encounter problems.