ellisp / forecastHybrid

Convenient functions for ensemble forecasts in R combining approaches from the {forecast} package
GNU General Public License v3.0
79 stars 23 forks source link

Restrict to a single core #92

Closed seabbs closed 4 years ago

seabbs commented 4 years ago

Hi,

Firstly thanks for a great package - this is excellent work.

TLDR: I am seeing multicore usage despite parallel being set to FALSE and it apparently defaulting to FALSE in all included models. Is there an option I can take to control this behaviour? This could very easily just be user error!

Detail:

We are running country-specific (and regional) covid forecasts using forecastHybrid with each area requiring around 2000 runs in order to properly account for the uncertainty (https://epiforecasts.io/covid/posts/global/). The implementation is slightly complex as it is nested in 2 development packages (https://github.com/epiforecasts/EpiSoon/blob/master/R/forecastHybrid_model.R and here https://github.com/epiforecasts/covid-global/blob/master/update_deaths_nowcasts.R). Any suggestions would be really great. We have just switched from using fable so I am sure there are lots of improvements we can make to our usage.

dashaub commented 4 years ago

forecastHybrid_model.R is giving me 404.

In update_deaths_nowcasts.R, I assume EpiSoon::forecastHybrid_model() just passes the arguments into forecast(hybridModel()), so it looks like you aren't doing anything exotic with the package and it should work fine, but it would be easier to examine the behavior in isolation if you have a small dataset that exhibits the unwanted parallel behavior.

Is there an option I can take to control this behaviour

If we know which model is responsible for the parallel behavior, we can pass in the necessary arguments to disable this; for example, if it is the auto.arima model, we could pass in a.args = list(parallel = FALSE). However, this really shouldn't be necessary, and I'm surprised to see any of this as you are since parallel = FALSE by default. You could setting num.cores = 1, and I would expect that to mask the issue, but still I'd like to get to the bottom of this.

seabbs commented 4 years ago

Thanks for the quick response @dashaub,

Happy to make a reprex but I thought it was known behaviour as it appears to happen on all examples (though very minimal examples make it hard to spot as the run time is so short) that I have tested.

Yes that was my understanding on how everything was structured. All the forecast models I have checked also appear to have parallel off by default so it is quite strange.

Apologies for the broken link - the package was just reorganised. Updated link: https://github.com/epiforecasts/EpiSoon/blob/1fbacf2b0fab1fddfc8ab115c0729a0c627fd2a8/R/model-wrappers.R#L252

I'll have another look and make a small example.

Sam

dashaub commented 4 years ago

Are any of your time series very long? From the documentation for the forecast::tbats() model, the default is use.parallel = length(y) > 1000. I just ran some tests on that for shorter series too and am still seeing parallel behavior though. For example

series <- rep(wineind, 5)
length(series) # 880
tbats(series) # uses more than 2 cores on my machine

I won't yet say that this is a bug with "forecast" without more investigation, but it does look like this behavior at least occurs there, and from watching htop while hybridModel fits the other component models, it looks like the others use only one core.

Manually overwriting use.parallel fixes it for me though, so this might be a workaround for you. tbats(series, use.parallel=FALSE)

seabbs commented 4 years ago

Nice job tracking this down.

No all currently short (daily and starting in January at the earliest). That sounds like exactly the issue - interesting it only occurs in a single model. Sounds like a good fix - will implement.

Thanks for checking this out - saved me a lot of hassle! Happy to close this as it doesn't sound like the issue is forecastHybrid. Saying that it might be wise to internally force supported models to be single-core as I would imagine that most users would want to use your external parallel support instead of within model parallelization.