JenniNiku / gllvm

Generalized Linear Latent Variable Models
https://jenniniku.github.io/gllvm/
48 stars 19 forks source link

ordination plotting in 1D? #89

Closed gooner255 closed 1 year ago

gooner255 commented 1 year ago

Hi, This is happening randomly to me. Happens with both unconstrained models and constrained models, both with latent variables = 2.

Restarting R and clearing models can work sometimes, but it happens often when I return to scripts and re-run stuff (i.e. when customising a single element of the graphs).

What could be the issue? Any help appreciated!

Photo shown is what it looks like, below is the code used;

Screen Shot 2023-02-07 at 10 25 57 am

<img width="621" alt="Screen Shot 20

Screen Shot 2023-02-07 at 10 29 06 am Screen Shot 2023-02-07 at 10 31 16 am
BertvanderVeen commented 1 year ago

Hello! This can happen in a few cases, especially when the dataset only supports a single latent variable, or poorly fits the model generally (e.g., when overfitting). The specification of a negative binomial distribution has a tendency to draw information away from the ordination, so that can exaggerate things. If that is the case, fitting you model with a poisson distribution and random row-effects might solve it.

It can also indicate convergence problems, in which case trying different starting values (argument starting.val) or different sets of initial values (argument n.init) might resolve it.

gooner255 commented 1 year ago

Thanks Bert!

I dropped spp with <3 across all samples, maybe that will help with overfitting? NB was needed, unfortunately.

I'd like more than 1 LV for plotting purposes. AICc for LV1 is 30-40 less than the model with LV2, but I would have thought this isn't that meaningful? Also, a model with 3 LV seems to plot fine. The starting values seem to have helped a bit.

Also, with the row effects, if I had two lots of samples, one collected at each site, but at two-time points (so two per site), is the random structured row effects argument the way to account for this?

So f is the structured grouping matrix of samples, grouped by location: ......studyDesign = f, row.eff = ~ (1|site)....

Or do I need a random term (1 | site) in the model also?

BertvanderVeen commented 1 year ago

We all want the ideal result, but what we get is what the data supports.

A random row-effect + Poisson distribution will account for overdispersion as well, just in a slightly different way that might work better in your case. Whether a NB is "necessary" always needs to be checked with residual diagnostics, and whether the row-effect + Poisson options work better (or not) as well. For the NB, make sure to have a look at 'gradient.check' to diagnose convergence issues. One thing with the NB that you could try is to group the dispersion parameters (which are by default per species), to simplify the model a bit. Have a look at the disp.formula argument for that. It is possible that if a model with 2 LVs fits poorly, weird things will happen that will not happen with a more flexible model (3 LVs here). However, 3 LVs + NB + predictors + row-effects is a tough model to fit, and a lot of information is needed to accurately estimate all parameters (i.e., many species/sites).

With the structured random-effect, I would suggest to just try and see if it works/how well it works. 'studyDesign' should be a character, indicating the column in X that contains "site" in the row-effect (I think). But, this specification is new and I am not completely sure, @JenniNiku can probably answer this better as she worked on that recently.

gooner255 commented 1 year ago

Hi Bert, thanks for the quick reply!

Yes, went with the ND due to residuals. Row-effect is now random, and now with LV1, as per the criteria and your assessment. The gradient.check is a really useful option!

With the row effects, I guess my question was (not very clear) is the structured row-effect argument just pooling the abundance of each row that belongs to a given group, and then modelling each species' relative abundance based on this total group count? If thats all its performing, and with NULL models not having a formula argument, how could one account for autocorrelation between samples?

Thanks again!

BertvanderVeen commented 1 year ago

Yes, that is what the row-effect is doing. If I get your question right, you are wondering if it is possible to include both i.i.d. row-effects in the model, and still account for autocorrelation. I don't believe that is possible currently - only one row-effect is currently supported.

I would really urge you to try to fit a model with row.eff="random" and family="poisson" with two latent variables, and see how it fits. Let me know if there is anything else I can help you with!

gooner255 commented 1 year ago

Hi Bert, I'm just after composition in each sample (so a single random row-effect seems appropriate), and to add sampling date as a factor in the model. With two dates I think only a fixed factor can do. This way, if I compare a null model to one with sampling date added, this would reduce patterns in ordination and increase the variation explained?

BertvanderVeen commented 1 year ago

Yes, that sounds about right :+1:.

hrlai commented 1 year ago

Hi, just dropping by to say that this also sometimes happen to a binomial GLMM too. I have found falling back to Laplace approximation or the logit link helps to avoid the second LV shrinking almost all to zeros... but I don't know why and what I'm doing.