JenniNiku / gllvm

Generalized Linear Latent Variable Models
https://jenniniku.github.io/gllvm/
48 stars 20 forks source link

Grouped dispersion parameters #54

Closed BertvanderVeen closed 2 years ago

BertvanderVeen commented 2 years ago

This PR adds an option to assume dispersion parameters are grouped across responses (disp.group is a vector of indices that can be provided in gllvm(.), for example: disp.group = c(rep(1,5),rep(2,5)), similar to e.g., the cut-offs for the ordinal model. I've found this to be useful for datasets with few sites, and while (say) analyzing specific species groups simultaneously (my example concerns pollinators and plants)

There are also some small bugfixes for the ordiplot function, when using arrows and species colors.

JenniNiku commented 2 years ago

Nice addition. But could you actually modify the implementation a bit. I recently learned that these kind of things can also be done with the map argument. For example, if we would like to have same dispersion parameter for all species, all we need to do is to set map$lg_phi =factor(rep(0, p)), and we do not need to modify the .cpp code. So in this case the lg_phi we give in MakeADFun, would be still length of p (num. of species) but the first element (at place 0 using the cpp counting) would be estimated and mapped to all species. Similarly for different variations eg. with p=10, map$lg_phi =factor(c(rep(0, p/2), rep(1, p/2))) would map the first element to first 5 species and the second element to last 5 species. This way we keep the .cpp file more clean.

BertvanderVeen commented 2 years ago

This latest commit should cover your request, but please double check as usual :). The mapping factors don't need to start at zero as TMB takes care of that, but for other reasons I require it starts at one anyway.

Here I also note that constrained ordination is prone to overfitting, even without quadratic responses, and with a negative-binomial distribution, when a dataset has few sites. In these cases grouping dispersion parameters should especially be considered (and even more so if all species exhibit similar overdispersion patterns).