Try out a skinny forest HP strategy for the RF models

Instead of a relatively small number of decision trees that themselves operate on a lot of data and are fairly deep, try out an alternative strategy using a large number of trees, but where each tree is relatively shallow and only operates on a relatively small data sample. A variation of this is to also consider stratified sampling with downsampling for negative cases.

mlr3 uses the following defaults for ranger():

learner = mlr3::lrn("classif.ranger")
learner$param_set$default

min.node.size: 1
mtry: no default, ranger's default is rounded down square root of the number of features
ntree: 500
maxdepth: no default
replace: true
sample.fraction: no default, but ranger's default is 1 for sampling with replacement (default) and 0.632 for sampling without replacement

The "sample.fraction" argument can be a vector giving the number of cases (relative to the total number of cases) to sample from each outcome factor class. See the bottom answer at https://stats.stackexchange.com/questions/171380/implementing-balanced-random-forest-brf-in-r-using-randomforests, and the linked ranger issues.

So something like sample.fraction = c(0.1, 0.9) for example should give a resampled dataset with 10% positive cases and same number of rows as original data.

Things to vary:

the number of trees
min.node.size
the total sample size, e.g. whether it should be 1 or some dramatically lower number
and the proportion of class samples, which together with the total sample size gives the sample fraction vector

Chao, Liaw, and Breiman in the balanced random forest paper recommend drawing same number of cases for both classes, i.e. proportion is 1:1, or sample.fraction = c(0.5, 0.5) or something like that. Maybe that's a good starting point.

So basically in total, three tuning strategies:

default RF with sample.fraction = 1 and optimizing over mtry and min.node.size; this I already have
Balanced RF with sample.fraction = c(0.5, 0.5)
Skinny RF with much larger number of trees, but smaller sample fractions, e.g. c(0.1, 0.1)

andybega / forecaster2

Try out a skinny forest HP strategy for the RF models #9