Low-N Sensitivity analysis, bootstrapping, optimising

Chuck stuff: It does seem that the Gaussian models stop working reliably (I got individual runs to work for bull and sandbar sharks, but could never get the same parameters to work more than once) somewhere between 44 and 33 “positive” sets. I wonder if it might be worth a separate paper doing some kind of sensitivity analysis to figure out where that line actually is? [chuck]

A bootstrapping function. Essentially looping the same params, but removing random single/multiple rows/columns of data to test for e.g. time series effect even if single year splits aren't powerful enough to run a BRT on their own because of insufficient data. library(boot) boot() see https://www.r-bloggers.com/the-wonders-of-foreach/ So maybe this kind of analysis could fit into the coding for one of these? Or all 3 together. They're all clearly related. Repeating, sometimes taking stuff out, and collating answers at the end. SD: I'm just bouncing an idea around my head whereby the code could:

run lower and lower (individual bin & gaus) lr/bf combos until it they failed
repeat the last working one a few times to test for resilience
Creep down a LITTLE bit to see if it can go a bit lower reliably (settable aggression parameter)
Essentially iterate until its got it's lowest reliable number
Describe the curve of lr/bf combo and (reliable) success rate, noting run time.
Do this for a number of species
Bootstrap to make the data poorer and poorer (manual after this point?)
Throw all the results together to see if we have something that looks to reveal an underlying relationship, i.e. data strength vs gbm success & processing time
Describe that relationship for various species.
Are there commonalities?

Much of this concept is subsumed within the gbm.tune plans; once gbm.tune is complete then steps 1-4 are done. Could then potentially add run time code (easy) but is that important? Will people want a tradeoff of less time for worse CV? I haven't yet worked with really really big data, maybe this is a thing?

Making data poorer bootstrapping: how much value gained by this? End up with a sense of a rule of thumb for what's likely to work, in terms of what? N, positiveN, variance, Nexpvars? Could bundle this into bfcheck? And/or update bfcheck & gbm.tune to be a one-stop-shop for pre-run testing? gbm.tune() params to be identical to gbm.auto (loop?)

Are all variables the correct format? Colnames, not all NAs, column types correct etc
optimised BRT params
Check installed packages?
Correct plotting device for your OS?
Working directory for saves
List colnames and where they'll appear so people know to change them if they want e.g. species names as plot main headings etc
more as it comes to me....

SimonDedman / gbm.auto

Low-N Sensitivity analysis, bootstrapping, optimising #18