JenniNiku / gllvm

Generalized Linear Latent Variable Models
https://jenniniku.github.io/gllvm/
48 stars 20 forks source link

Error when installing the development version of gllvm from github #71

Open cgoetsch opened 2 years ago

cgoetsch commented 2 years ago

When I try to install the development version of the package with: library(devtools) devtools::install_github("JenniNiku/gllvm")

I get the following error code:

cc1plus.exe: out of memory allocating 796224 bytes make: *** [C:/PROGRA~1/R/R-41~1.2/etc/i386/Makeconf:245: gllvm.o] Error 1 ERROR: compilation failed for package 'gllvm'

I am currently running 64-bit Windows 10 on a 3.80GHz processor. I have 95 GB usable RAM. No other programs were open/running while trying to install except Rstudio. Only 9% of memory is being utilized.

BertvanderVeen commented 2 years ago

I have noticed similar behavior at times when installing using devtools. For some reason, memory allocation does not seem to go well when installing using devtools - I am not sure if this will happen once things are pushed to CRAN.

Out of curiosity, do you have a particular reason for wanting to install the development branch?

cgoetsch commented 2 years ago

I was running into an error when trying to run gllvm and thought that running the development version might solve it. I restarted R and am not getting the error any more, but am still having issues. I am trying to run a test model on my data and it is taking a very long time. I thought this was optimized for very large datasets?

Here is a brief description of my data: Our data is multispecies count data gleaned from digital aerial surveys. As we do not have "sites" in the traditional sense, to model this we have snapped a 4km x 4km grid to our study area and gotten the count for our species within each grid cell. Thus, the centroid location for each grid cell functions as a observation unit. As a test data set , we only have 6 species groups. We have 3,361 grid cells that can have up to 15 surveys per grid cell, which gives us 30,641 observation sampling units.

Ideally, I would like to run a joint species model with gllvm since it allows for the incorporation of effort via an offset, and for incorporating structured row.effects for spatial correlation of grid cells and time (season of survey). I thought that this package, being likelihood based instead of Bayesian, would run faster, but it seems to be taking forever to even run a subset of the data. Could you perhaps assist me?

Currently, I am only trying to run a no predictors test model with no environmental covariates, row.effects, or offset included.

test_m<-gllvm(y = y_ms, family = "poisson")

On a subset of the data with only 5,649 observational units, it is not finishing after even a few hours of runtime. Is this to be expected?

BertvanderVeen commented 2 years ago

We are currently working on an update, and the master branch includes some bugs that you might run into if you try to install that. It installs OK in my linux machine, though better when installing through remotes than through devtools it seems.

Do you remember what the error was?

Frequentist methods are usually faster than those implemented in Bayes, yes, but computation speed still scales with the amount of data, so things will be slower for larger datasets.

My suggestion for now would be: 1) Make sure to screen your data and remove sites without observations 2) Try switching to "starting.val="zero"" in case it is the starting value generation that is taking a lot of time 3) Set "num.lv = 1" (2 is the default)

See if @JenniNiku has any other suggestions.

cgoetsch commented 2 years ago

Unfortunately, I don't remember what the exact error was. I restarted R, so I lost the output and traceback. I think it had something to do with the TMB settings, it was expecting 4 parameters and only got 3. As part of trying to get the development version installed, I updated the TMB package, so maybe that was the issue.

I have already taken all NAs - no observations and no effort out of the data. So there are sites with 0 species observed, but those are true zeros. I will try as you suggested, switching the starting value settings and the num.lv. I will also try narrowing the survey set even more to get at least a test model completed.

Thanks for your advice.

BertvanderVeen commented 2 years ago

Regardless of whether they are true zeros or not - sites that have no observations can cause problems, they provide no information for the latent variable while the model still includes site-specific hyperparameters that need to be estimated.

JenniNiku commented 2 years ago

Hi, The memory allocation problem is actually a reason why I haven't updated package to CRAN lately, as CRAN checks do not pass because of that. And that is what I have been trying to solve, unfortunately without success so far. Based on checks I have thought it is related only to 32 bit windows put apparently that can occur elsewhere as well. About your example, you could set at first sd.errors=FALSE to see how long time the model fitting takes. If the model fitting succeeded and converged, sd errors can be calculated with se-function then.

smithja16 commented 1 year ago

I've had the memory allocation error fail a github installation of gllvm_1.4.1 many times on one Windows computer. But then the same installation succeeded on another computer. Only real difference is that the succeeding computer has a newer version of R, and perhaps different versions of dependent packages.

BertvanderVeen commented 1 year ago

Yes, this seems isolated to an older version of R; I have encountered this once on an older windows computer and it was "fixed" by updating R.

gerverska commented 4 months ago

As a counterexample, I have the same environment on two different machines (one on a computing cluster, one on my personal machine), and I still am encountering this issue on the computing cluster.