Open SimonDedman opened 6 years ago
Much of this concept is subsumed within the gbm.tune plans; once gbm.tune is complete then steps 1-4 are done. Could then potentially add run time code (easy) but is that important? Will people want a tradeoff of less time for worse CV? I haven't yet worked with really really big data, maybe this is a thing?
Making data poorer bootstrapping: how much value gained by this? End up with a sense of a rule of thumb for what's likely to work, in terms of what? N, positiveN, variance, Nexpvars? Could bundle this into bfcheck? And/or update bfcheck & gbm.tune to be a one-stop-shop for pre-run testing? gbm.tune() params to be identical to gbm.auto (loop?)
Chuck stuff: It does seem that the Gaussian models stop working reliably (I got individual runs to work for bull and sandbar sharks, but could never get the same parameters to work more than once) somewhere between 44 and 33 “positive” sets. I wonder if it might be worth a separate paper doing some kind of sensitivity analysis to figure out where that line actually is? [chuck]
A bootstrapping function. Essentially looping the same params, but removing random single/multiple rows/columns of data to test for e.g. time series effect even if single year splits aren't powerful enough to run a BRT on their own because of insufficient data. library(boot) boot() see https://www.r-bloggers.com/the-wonders-of-foreach/ So maybe this kind of analysis could fit into the coding for one of these? Or all 3 together. They're all clearly related. Repeating, sometimes taking stuff out, and collating answers at the end. SD: I'm just bouncing an idea around my head whereby the code could: