SimonDedman / gbm.auto

Machine-learning Boosted Regression Tree software suite for species distribution modelling in R
https://doi.org/10.1371/journal.pone.0188955
Other
18 stars 6 forks source link

Auto: Multicore processing #21

Open SimonDedman opened 6 years ago

SimonDedman commented 6 years ago

Only worthwhile for the BRT element, the rest is quick. Can only do for multiple BRTs i.e. Bin & Gaus at same time, then Bin simp & Gaus simp at same time, i.e. two threads for the separate delta parts. See multicore element in Erik Franklin's optimiser code.

See https://stackoverflow.com/questions/29873577/r-dismogbm-step-parameter-selection-function-in-parallel detectCores, foreach cores<-detectCores(all.tests = FALSE, logical = FALSE) https://cran.r-project.org/web/packages/doParallel/vignettes/gettingstartedParallel.pdf https://www.r-bloggers.com/the-wonders-of-foreach/ Also pacman::p_load for better package loading e.g. require(pacman) p_load(gbm, dismo, TeachingDemos, foreach, doParallel, data.table) already supposedly incorporated; doesnt work Re-investigate multicore R https://stackoverflow.com/questions/4775098/r-with-a-multi-core-processor http://www.google.com/url?q=http%3A%2F%2Fcran.r-project.org%2Fweb%2Fviews%2FHighPerformanceComputing.html&sa=D&sntz=1&usg=AFQjCNEshsLTzWAF9g4cUGzvn5zIfIywsA https://groups.google.com/forum/#!topic/davis-rug/VLIXz5i3vZI n.cores in gbm The number of CPU cores to use. The cross-validation loop will attempt to send different CV folds off to different cores. If n.cores is not specified by the user, it is guessed using the detectCores function in the parallel package. Note that the documentation for detectCores makes clear that it is not failsafe and could return a spurious number of available cores. for gbm.step gbm.simplify gbm.plot gbm.plot.fit gbm.predict.grids gbm.percpec (3d plot). Try all of them? 'parallel' package loaded by gmb? n.cores = detectCores(all.tests = FALSE, logical = FALSE) Warning messages: 1: In plot.window(...) : "n.cores" is not a graphical parameter 2: In plot.xy(xy, type, ...) : "n.cores" is not a graphical parameter 3: In axis(side = side, at = at, labels = labels, ...) : "n.cores" is not a graphical parameter 4: In axis(side = side, at = at, labels = labels, ...) : "n.cores" is not a graphical parameter 5: In box(...) : "n.cores" is not a graphical parameter 6: In title(...) : "n.cores" is not a graphical parameter 7: "n.cores" is not a graphical parameter So: fails and is supposedly done by default anyway. See: C:\Users\Simon\Dropbox\Galway\Analysis\R\Coilin R code\rmpi_example.R mpi doesn't work on my laptop? due to daily rstudio build, or broken laptop, or neither? gbm.auto(expvar = c(4:9,11), resvar = 12, grids = mygrids, lr = c(0.02), ZI = TRUE, map = FALSE, RSB = FALSE, tc = 2, varint = FALSE, savegbm = FALSE) posted to stack exchange, no answer

SimonDedman commented 5 years ago

Foreach loops: given one is expecting to run a single resvar which contains from 1 (bin/gaus nosimp) to loads (bin+gaus * simp for multiple tc lr bf combos) of BRTs, and parallelising that process will be complicated by platform choice and different options and might not save tons of time on its own, look to the biggest source of time suck: repetitions.

Usually I run loads of resvars with the same (ish) params, and/or for a range of grids, and/or in gbm.loops. Therefore:

  1. gbm.loop: convert this to foreach loops
  2. manually run resvar and/or grids loops: convert to foreach loops https://www.r-bloggers.com/parallel-r-loops-for-windows-and-linux/

library(foreach) library(doMC) registerDoMC(n) #n=CPU cores
foreach(i=1:10) %dopar% {

loop contents here

} Can put each gbm.auto call in there. Don't have to worry about integrating outputs as they're all saved, none are used by subsequent processes, so this should be easy.

SimonDedman commented 4 years ago

gbm.loop: could use foreach's: times(10), see https://datawookie.netlify.com/blog/2013/08/the-wonders-of-foreach/

SimonDedman commented 4 years ago

gbm.auto: could foreach all parameters instead of 3 nested loops of lr bf tc: "We can string together multiple calls to foreach() using the %:% nesting operator. foreach(n = 1:5) %:% foreach(m = 1:3) %do% max.eig(n, m)" https://datawookie.netlify.com/blog/2013/08/the-wonders-of-foreach/ Output is nested lists so might become problematic... though the current output is all saved objects rather than screen output...

SimonDedman commented 3 years ago

xgboost would address this. See issue.