Closed taylorreiter closed 4 years ago
Allow me to answer this question:
From the README
Out-of-bag predictions are used for evaluation, which makes it much faster than other packages and tuning strategies that use for example 5-fold cross-validation
In the code: https://github.com/PhilippPro/tuneRanger/blob/abe82774ce449f6acc5623e9e9e5d867c3efd910/R/tuneRanger.R#L99-L100
In other words: tuneRanger
does not need a test set because each tree is only trained on a subset of the data (bag), so we can use the rest (out of bag) to obtain an unbiased performance estimation of a single tree and therefore of all trees.
Does this answer your question?
Yes, thank you so much for taking the time to answer this question, and to give the details so clearly! I really appreciate it.
You're welcome.
Hi @PhilippPro! Thank you for tuneRanger, I've been having fun trying it out, and it has dramatically simplified my tuning pipeline. I'm curious how the samples are handled under the hood...do I need to split my data set 70:30 into training and testing, and then run tuneRanger on the 70%? I have been using a 70:30 split for training and testing, and then I have and independent validation dataset. I don't see that you use a train/test split in your documentation, so I am curious what the recommended best-practice is.