epiforecasts / covid-rt-estimates

National and subnational estimates of the time-varying reproduction number for Covid-19
https://epiforecasts.io/covid/
MIT License
34 stars 17 forks source link

1/2 number of cores used on Azure in Docker #1

Closed seabbs closed 4 years ago

seabbs commented 4 years ago

When running an update in docker on an Azure cluster only half of the available cores are used.

Cores are allocated using setup_future in R/utils.R. This uses future::availableCores() internally and should default to all cores when jobs > cores when jobs < cores then the remaining cores should be shared between jobs and used to run multiple MCMC chains. Local tests indicate all of these features are working as intended outside of Azure/script use in docker.

joeHickson commented 4 years ago

Interestingly running an R script from a shell script in docker in azure returns the right value.

R/test_avail.R:

require(future)
fc <- file("/home/data/test_log.txt")
writeLines(c("Number of cores", future::availableCores()), fc)
close(fc)

bin/test_avail.sh:

#!/bin/bash

Rscript "R/test_avail.R"

/home/data/test_log.txt:

Number of cores
64

I can repeat the original symptoms though - still waiting for the run to progress to confirm if it uses the empty cores.

seabbs commented 4 years ago

Interesting. It could perhaps be how I am setting up the multicore usage (https://github.com/epiforecasts/covid-rt-estimates/blob/cf469e22e51e4f049a3b113f4fcaf22709d25751/R/utils.R#L2) or potentially it's not forcing stan to use 1 core correctly in EpiNow2. There should only be a single parallel call in EpiNow2 (in regional_epinow) with any remaining cores used by estimate_infections to run multiple MCMC chains.

What I find very odd is that everything works when used interactively!

joeHickson commented 4 years ago

there's some wierdness in that the multicore stuff often has catches on it to change how it runs interactively because rstudio gets in a grump with multicore. I have seen this in a few places: ''' if (!interactive()) {

If running as a script enable this

options(future.fork.enable = TRUE)

} ''' and also found reference to some multithreading libs having similar protections.

seabbs commented 4 years ago

Yes - I'm adding that to force it to use forking when run in a script from rstudio - otherwise it forces it off. I would have thought it wouldn't make a difference here as its just running in bash on a linux server!

joeHickson commented 4 years ago

Not sure what's changed but the latest version of EpiNow2 seem to be fully utilising the cores (wasn't quick enough to catch it early when they were all running): image

seabbs commented 4 years ago

Interesting!

So I am still seeing only some cores working when jobs > cores but when cores < jobs I get optimal (or near-optimal usage) as I would expect (with each region getting cores / regions cores.

I don't think there have been any changes to impact this as I have been mainly focussing on trying to get the website linked in with these estimates (and actually quite distracted by some UK work).

If you have some time to look at this I think a sensible debug would be to look at running something else that uses the future setup (i.e one of their examples) and seeing if the setup_future function I have written is driving this weird behaviour or if it is internal to EpiNow2.

joeHickson commented 4 years ago

image I might extend the logging work once it's in to spit out some debug messages about how many cores it is using etc

seabbs commented 4 years ago

Well, that looks quite convincing!

seabbs commented 4 years ago

So checking a run with everything on the most recent master version I still see this but to less of an extent. Given I recently adapted the setup_future function this indicates to me that it is the source of the problem.

Screenshot 2020-08-13 at 16 38 22
seabbs commented 4 years ago

I think this is fixed so closing for now