SimonDedman / gbm.auto

Machine-learning Boosted Regression Tree software suite for species distribution modelling in R
https://doi.org/10.1371/journal.pone.0188955
Other
18 stars 6 forks source link

Processing time estimate #23

Open SimonDedman opened 6 years ago

SimonDedman commented 6 years ago

See https://cran.r-project.org/web/packages/doParallel/vignettes/gettingstartedParallel.pdf p3 system.time({code to run})[3] option to have R poll the computer and, when you say go, popup a box saying "Based on the parameters you've selected to try, this will run X models on a Y-item-sized dataset (Y = variables * count) which may take about Z minutes based on your processor. You have a multicore processor so it will take Z/#processors (time) if you have multicore processing enabled - see here LINK" OK/Cancel Edit the BRT progrss counter to account for multiple resvars: currently does e.g. n/8 then loops back to 1. Not useful. Also could print the current resvar name. Maybe do a running time thing? "This is BRT N of X, Y% complete, took time Z, total expected time AA, time remaining AB see proc.time() see http://www.ats.ucla.edu/stat/r/faq/timing_code.htm ptm <- proc.time() # Start the clock! proc.time() - ptm # Stop the clock ptm$elapsed # is the time taken in seconds maybe add to gbm.auto's report. Plus: size (total cells, dim) of 'active' database, whether maps generated, RSB, BnW, savegbm, varint, sizes of tc,lr,bf, (smaller numbers take longer) length of tc,lr,bf, CPU speed, number of CPU cores (assuming multithreading sorted!), RAM size

Assumedly something like time= dbasesizesizes(tc,lr,bf)length(tclrbf) #sizes would need a reciprocal, smaller takes longer +savegbm +varint all(2 if ZI=true?) +mapprod(lengths(tc,lr,bf)) +RSBprod(lengths(tc,lr,bf)) +BnWprod(lengths(tc,lr,bf)) All divided by CPUcoresRAM? Run gbm.auto loads of times, changing one parameter through its range each time, and calculate relationship of model parameters to processing time on MY laptop. Then can logically work out how long it should take on another machine? Try it on my home one. RAM doesn't matter unless it's limiting? Limitingness is a function of dbasesize and parameters, especially if I'm not cleaning the workspace within the function run?

SimonDedman commented 6 years ago

https://www.r-bloggers.com/5-ways-to-measure-running-time-of-r-code/