JenniNiku / gllvm

Generalized Linear Latent Variable Models
https://jenniniku.github.io/gllvm/
48 stars 19 forks source link

model.matrix versus lv.X #172

Closed AlainZuur closed 3 months ago

AlainZuur commented 3 months ago

Hello,

This is the CRAN version.

Model: M4 <- gllvm(y = SpecData, X = CovX.s, num.lv = 0, num.RR = 2, lv.formula = ~ ANG + Altitude + LogSlope + MMDischarge + pH + Calcium + Nitrate + Ammonium + DisOxygen + OxygenDemand, family = "poisson",

control = list(optimizer = "optim"), #alabama

        control.start = list(n.init = 5, jitter.var = 0.1))

Xm <- model.matrix(~ ANG + Altitude + LogSlope + MMDischarge + pH + Calcium + Nitrate + Ammonium + DisOxygen + OxygenDemand, data = CovX.s) X <- as.matrix(Xm)[,-1] #' Minus the intercept

M4$lv.X

Is there any reason why the first column in X and M4$lv.X differ? ANG has only 4 or 5 unique values. If I remove ANG, rerun the model, then everything is identical.

Same issue if I put ANG at the end.

M4$lv.X[, "ANG"] 1 2 3 4 5 6 7 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 2 2 2 0 0 0 2 3 4 29 30 4 5

X[,"ANG"] 1 2 3 4 5 6 -0.63674995 -0.63674995 -0.63674995 -0.63674995 -0.63674995 -0.63674995 7 9 10 11 12 13 -0.63674995 -0.63674995 -0.63674995 -0.63674995 -0.63674995 -0.63674995 14 15 16 17 18 19 -0.63674995 -0.63674995 -0.63674995 0.04716666 0.04716666 0.04716666 20 21 22 23 24 25 0.73108328 0.73108328 0.73108328 -0.63674995 -0.63674995 -0.63674995 26 27 28 29 30 0.73108328 1.41499989 2.09891651 2.09891651 2.78283312

And its numerical.

I know that the DHARMa package decides to do its own thing if it sees a covariate with only a limited number of unique values. Does gllvm do something similar?

Alain

AlainZuur commented 3 months ago

It is the lv.formula. If I drop it, then it is all ok.

I missed the "(for latent variables)" in the help file. I assume that this means: not for constrained latent variables ?

In other words, if I want to run an RRR with 2 CLVs...and 0 LVs....I have to use y =Y and X = X...and that's it? And plug interactions inside the X.

Or is this a bug?

Alain

BertvanderVeen commented 3 months ago

Not quite sure I follow?

AlainZuur commented 3 months ago

Let me do it for the spiders...then you can see it.

AlainZuur commented 3 months ago

I can't reproduce it for the spider data. But this is the issue:

image

They match without the lv.formula formula.

Can I just double check....lv.formula can be used to specify the covarates in the CLV with RRR ? Alain

BertvanderVeen commented 3 months ago

Definitely, that's the whole point. What's the output of str(CovX.s), str(M4$lv.X) and str(M4$lv.X.design)?

AlainZuur commented 3 months ago

It is a name thing. If I use the name ANG2 then M4$lv.X contains the same data for the selected covariates as CovX.s. And all is fine.

Is it possible that in your gllvm code, you are using a variable with the name ANG? Something that interacts with the formula argument.

It is the CRAN version.

From my point of view, I will just change the name...solves the problem.

To answer your question, see below. Kind regards,

Alain

str(CovX.s) 'data.frame': 29 obs. of 11 variables: $ ANG : num -0.637 -0.637 -0.637 -0.637 -0.637 ... $ Altitude : num 1.72 1.71 1.64 1.42 1.4 ... $ LogSlope : num 3.012 0.654 0.832 0.709 0.428 ... $ MMDischarge : num -1.23 -1.22 -1.17 -1.13 -1.13 ... $ pH : num -0.84 -0.273 1.426 -0.273 0.293 ... $ Calcium : num -2.388 -2.681 -1.979 -0.809 -0.107 ... $ Nitrate : num -1.06 -1.06 -1.04 -1.05 -0.83 ... $ Ammonium : num -0.5512 -0.2917 -0.4214 -0.5512 -0.0322 ... $ DisOxygen : num 1.236 0.375 0.466 0.692 -0.667 ... $ OxygenDemand: num -0.595 -0.8 -0.389 -0.955 0.305 ... $ ANG2 : num -0.637 -0.637 -0.637 -0.637 -0.637 ...

str(M4$lv.X) num [1:29, 1:2] -0.637 -0.637 -0.637 -0.637 -0.637 ...

  • attr(*, "dimnames")=List of 2 ..$ : chr [1:29] "1" "2" "3" "4" ... ..$ : chr [1:2] "ANG2" "Altitude"
BertvanderVeen commented 3 months ago

No, there is nothing with the name "ANG" and even if I simulate some data with a variable "ANG" and a variable "Altitude" everything goes like it should. Is your data available somewhere so that I can attempt to reproduce the issue?

AlainZuur commented 3 months ago

Let me setup a posit RStudio with the data and the code that produces the error. I will sent an invite.