Optimal-Learning-Lab / LKT

12 stars 2 forks source link

Error: vector memory exhausted (limit reached?) #21

Open wbreilly opened 10 months ago

wbreilly commented 10 months ago

I am having trouble running models with random intercepts for student and KC due to what is evidently a hardware limitation. These models produce Error: vector memory exhausted (limit reached?). The dataframe has about 800,000 rows and 22 columns.

I tried a work around where I created 100 folds and ran the model successfully with a truncated dataframe containing one fold. However, when I passed the full dataframe and specificed cv=TRUE, I received the memory limit error once again.

I would greatly appreciate any suggestions. Thank you!

imrryr commented 10 months ago

Does it work without cv=TRUE? I have new code that will be shared soon that handles crossvalidation better. I can share it with you I think. Are you making sure to use data.table objects?

wbreilly commented 10 months ago

With the full dataframe and random intecepts I get the same error with CV=FALSE. With the full dataframe and fixed intercepts there are no errors whether or not CV=TRUE. These are data.table objects. My thought was that I could use CV and 100 folds to reduce the memory burden. I would be happy to try your new code

imrryr commented 10 months ago

Sorry, I didn't read the first question carefully enough. The random effects component of the system is not that good to handle large data like that. The random effects is done through the lme4 package, which is very slow and bad with memory. Unfortunately, I didn't have a fast alternative method, and I still dont. Models with random effects require it. Models with only fixed effects use the LibLineaR package, which is super fast. Sorry for this issue, I should produce a warning message, and will leave this issue open as a note to at least do that for the future release.

I'd be happy to discuss alternatives to random effects, but of course theory-based analysis would require it. For practical applications, I often use tracking features like logitdec, which, over repetitions, determines the probable intercept values for the categorical levels. This works great for practical application, but might not suit your needs.