Closed ejlundgren closed 1 month ago
Since version 3.0 there has been quite explicit consideration of parallel processing, and is the reason why a major architectural aspect of groundhog was modified so that groundohg works by moving pkgs in and out of the default personal library (instead of calling those pkgs from dynamically set paths to additional libraries). This way, pkgs loaded with groundhog in one core are available for loading for all cores without making additional calls to groundhog. In light of this, two thoughts.
First, i think a better practice might be to leave the groundhog command outside the foreach loop and make call specifying the pkgs within the foreach something like
foreach...() data.table::read.table()
it should be more efficient and easier to read, but these are often matters of taste.
Second, i don't think the errors you are getting are the result of using groundhog or foreach, they seem instead related to the .rds files. So, a question: If you load the .rds files with a non-parallel loop for(), or if you read them using library() instead of groundhog.library() to load the required packages, does the code still produce the errors you are reporting? I think it will. But if it does not, if the errors were to only arisie with groundhog, i would give this a closer look
Dear Uri,
Thanks so much for your fast response.
Just to clarify, namespacing the packages with package::function should do the trick? Does this mean I should not load packages in each core with foreach(..., .packages = c("xxx", "xxx"))
?
I guess I don't know how the foreach .packages call works---whether it is shorthand for running library("data.table")
in each core (i.e., not groundhog version controlled) or whether it transfers the packages in the Global Environment (i.e., the groundhog version controlled package) to each core. I hope that's clear.
And, sorry, I should have clarified. The .Rds files load just fine when I load them sequentially or when I load packages in the foreach .packages = c("xxx")
call. They only fail to load (and only about 5% fail) when I called groundhog directly in the body of the foreach loop.
Got it. Actually, in any case you should not need to call groundhog within the loop, you should run as if you were using library() instead of groundhog.library(). Have the groundhog.library() call before the foreach, then do .packages() in the foreach Something like this:
#Before the loop
groundhog.day <- "2024-03-01"
libs <- c("metafor", "data.table", "dplyr",
"tidyr", "broom", "multcomp")
groundhog.library(libs, groundhog.day)
#loop
posthoc_comp_out <- foreach(i = 1:nrow(posthoc_comps),
.packages = libs,
.errorhandling = "pass") %dopar% {
m <- readRDS(posthoc_comps[i, ]$path)
# Bunch of other things...
}
If it does not work, try again with a date prior to two recent updates of data.table(). Try 2024-02-15 and if still fails 2024-01-15.
If that still fails, try this maximally similar code with non-parallel looping
groundhog.day <- "2024-03-01"
libs <- c("metafor", "data.table", "dplyr",
"tidyr", "broom", "multcomp")
groundhog.library(libs, groundhog.day)
#loop
for (1:nrow(posthoc_comps))
{ m <- readRDS(posthoc_comps[i, ]$path)
# Bunch of other things...
}
The error is produced by data.table() so i think somehow you are using different versions of it when you put the groundhog call in the loop vs when you do it the other way.
Let's hope one of these ideas gets to it.
Thanks a ton, that works! I just wanted to verify that that would load the groundhog versioned package into each core and not the default library version. Much appreciated!!
First, thank you for this wonderful package. I am preaching to everyone I know to use it!
I am writing to see if there is any documentation about using groundhog with foreach loops, or if this is an active realm of package development. I experimented with loading libraries inside the foreach loop with groundhog, which almost appears to work, but produces random errors. In this case, I am loading already saved models (.Rds files) and performing posthoc tests on them.
I do not know how to make this issue reproducible on your end...
A simplified version of the code that works:
Loading libraries inside foreach with groundhog:
This code produces these errors for random elements of the output list, but inconsistently:
My R version is 4.3.2, groundhog version 3.2.0, foreach version 1.5.2, doSNOW version 1.0.20, parallel version 4.3.2