Closed seabbs closed 4 years ago
Interestingly running an R script from a shell script in docker in azure returns the right value.
R/test_avail.R:
require(future)
fc <- file("/home/data/test_log.txt")
writeLines(c("Number of cores", future::availableCores()), fc)
close(fc)
bin/test_avail.sh:
#!/bin/bash
Rscript "R/test_avail.R"
/home/data/test_log.txt:
Number of cores
64
I can repeat the original symptoms though - still waiting for the run to progress to confirm if it uses the empty cores.
Interesting. It could perhaps be how I am setting up the multicore usage (https://github.com/epiforecasts/covid-rt-estimates/blob/cf469e22e51e4f049a3b113f4fcaf22709d25751/R/utils.R#L2) or potentially it's not forcing stan to use 1 core correctly in EpiNow2. There should only be a single parallel call in EpiNow2 (in regional_epinow
) with any remaining cores used by estimate_infections
to run multiple MCMC chains.
What I find very odd is that everything works when used interactively!
there's some wierdness in that the multicore stuff often has catches on it to change how it runs interactively because rstudio gets in a grump with multicore. I have seen this in a few places: ''' if (!interactive()) {
options(future.fork.enable = TRUE)
} ''' and also found reference to some multithreading libs having similar protections.
Yes - I'm adding that to force it to use forking when run in a script from rstudio - otherwise it forces it off. I would have thought it wouldn't make a difference here as its just running in bash on a linux server!
Not sure what's changed but the latest version of EpiNow2 seem to be fully utilising the cores (wasn't quick enough to catch it early when they were all running):
Interesting!
So I am still seeing only some cores working when jobs > cores but when cores < jobs I get optimal (or near-optimal usage) as I would expect (with each region getting cores / regions cores.
I don't think there have been any changes to impact this as I have been mainly focussing on trying to get the website linked in with these estimates (and actually quite distracted by some UK work).
If you have some time to look at this I think a sensible debug would be to look at running something else that uses the future setup (i.e one of their examples) and seeing if the setup_future
function I have written is driving this weird behaviour or if it is internal to EpiNow2
.
I might extend the logging work once it's in to spit out some debug messages about how many cores it is using etc
Well, that looks quite convincing!
So checking a run with everything on the most recent master version I still see this but to less of an extent. Given I recently adapted the setup_future
function this indicates to me that it is the source of the problem.
I think this is fixed so closing for now
When running an update in docker on an Azure cluster only half of the available cores are used.
Cores are allocated using
setup_future
inR/utils.R
. This usesfuture::availableCores()
internally and should default to all cores when jobs > cores when jobs < cores then the remaining cores should be shared between jobs and used to run multiple MCMC chains. Local tests indicate all of these features are working as intended outside of Azure/script use in docker.