Closed johannes-kk closed 4 years ago
If we no longer use mtry==0 to represent sqrt(), can mtry ever be 0?
Hmm... this boils down to another design choice, I think. If we set mtry
at the tree level, then a way to implement this in findBestSplit()
is that when mtry != num_features
then shuffle the column indices, otherwise do not. This assumes that any vanilla DecisionTree will have mtry == num_features
(which really it should, otherwise it's not a vanilla CART), while it can still be used as a subtree in a RandomForest
because it allows for a smaller mtry
(at which point it also shuffles the indices and draws a random subset mtry
).
Primarily, what I'd like is that when we pass mtry = 1
to the DecisionTree
constructor (i.e. the default value), it sets DecisionTree.mtry = num_features
, whereas for RandomForest
the default value mtry = -1
means its constructor sets RandomForest.mtry = sqrt(num_features)
and it applies shuffling in findBestSplit()
.
Currently mtry is passed as a parameter to the
DecisionTree
orRandomForest
constructor, then passed on tofindBestSplit
. The default should differ for DT and RF:mtry
equal to the number of predictors andmtry
equal to the square root of that number, respectively.As such, move setting of
mtry
default to the parent tree object, and pass that on.