JenniNiku / gllvm

Generalized Linear Latent Variable Models
https://jenniniku.github.io/gllvm/
48 stars 20 forks source link

Different theta estimates from previous versions #49

Closed RodolfoPelinson closed 2 years ago

RodolfoPelinson commented 3 years ago

I am unsuccessfully trying to reproduce the exact results I got in the previous version 1.3.0 with the new version 1.3.1. Specifically, I am getting different (and high!) values of the theta parameters for some species.

I wonder if this is an issue, a bug fix, or just some default setting that changed from one version to another. The only difference in the default settings that I noticed was the addition of the lv.formula, num.lv.c, num.RR, dist, and corWithin arguments in the gllvm function. Maybe it has something to do with the new possibility of adding structured random row effects. Additionally, the seed argument seems to not be working. If this is not an issue, can you enlighten me on what is the difference between the two different sets of estimates?

This is an example of the theta parameters of a small data set without environmental covariates.

#Version 1.3.0
install.packages("https://cran.r-project.org/src/contrib/Archive/gllvm/gllvm_1.3.0.tar.gz", repos=NULL, type="source", dependencies = TRUE)

fit_gllvm_no_effect_SS1 <- gllvm(com_SS1,family = "negative.binomial", method = "VA",
                                 row.eff = F,n.init = 20, num.lv = 2, seed = 3)

> fit_gllvm_no_effect_SS1$params$theta
                              LV1         LV2
Berosus                0.18806254  0.00000000
Chironominae           0.75592025  0.10913879
Culex                 -0.06431379  0.82487536
Microvelia            -0.12998731  0.20099789
Pantala                0.06872666  0.26967053
Physalaemos nattereri  0.07604585 -1.14269377
Rhantus                0.15400443 -0.07519812
Scinax sp.             0.26405024 -0.18103823

#Version 1.3.1
install.packages("gllvm")

fit_gllvm_no_effect_SS1 <- gllvm(com_SS1,family = "negative.binomial", method = "VA",
                                 row.eff = F,n.init = 20, num.lv = 2, seed = 3)
Error in set.seed(seed) : the specified seed is not a valid integer

fit_gllvm_no_effect_SS1 <- gllvm(com_SS1,family = "negative.binomial", method = "VA",
                                 row.eff = F,n.init = 20, num.lv = 2)

> fit_gllvm_no_effect_SS1$params$theta
                             LV1         LV2
Berosus                1.0000000   0.0000000
Chironominae           4.0194930   1.0000000
Culex                 -0.3422043   7.5553350
Microvelia            -0.6912662   1.8409825
Pantala                0.3653821   2.4699821
Physalaemos nattereri  0.4043929 -10.4691614
Rhantus                0.8188979  -0.6887171
Scinax sp.             1.4041836  -1.6570731
BertvanderVeen commented 3 years ago

Thanks for posting your issue!

In the last update a lot of changes were made in the r package, including the parameterization of the model. The theta coefficients are now relative to the diagonal of that matrix being one, for compatibility with model-based constrained ordination (the num.lv.c and num.RR arguments), and a scale parameter was added for the LVs, which you can examine with the summary function, so the number of parameters in the model has remained the same.

The issue you're having with the seed is because you're only specifying a single seed value, while asking for 20 model runs. If that would work, all 20 model runs would give the same result and you would wrongly get the impression you tried different starting values. So, if n.init=20, the length of seed should also be 20. The function should probably provide a more intuitive error message if the length of seed is not the same as n.init.

RodolfoPelinson commented 3 years ago

Thank you so much for the quick answer! It does make sense.

JenniNiku commented 3 years ago

I made some corrections to the documentation so that they explain correctly those thetas in this commit

I also made some changes to the code in this commit, so that it does not return error if seed length is 1 in case of n.init>1. In this case, n.init seed numbers are sampled between 1-10000 like this:

    set.seed(seed) # this is the original seed number, with length(seed)=1
    seed <- sample(1:10000, n.init)

I think this was the way some time ago before, as well, but changed at some point. In case of seed=NULL, the first line is ignored and if seed length is same as n.init, the given seeds are used. So these are coming in the next update to CRAN

BertvanderVeen commented 3 years ago

Thanks Jenni. FYI I still have a few minor bugfixes/changes on a separate branch that I will need to put in a PR for at some point.