_episodes_rmd/04-Decision-Forests.Rmd: Major text and code edit

Even though random forests do have a built-in way of assessing performance using OOB, while it may be technically defensible to use this in some applications without separate training/testing datasets, I still think it is always good practice to use separate training/testing datasets in ML. Here’s a good explanation: https://www.dataminingapps.com/2018/02/is-it-really-necessary-to-split-a-data-set-into-training-and-validation-when-building-a-random-forest-model-since-each-tree-built-uses-a-random-sample-with-replacem/ . Beyond the 3 reasons listed here, another good reason to use separate training/testing datasets is that it allows you to calculate any performance metric you want, not just the OOB error rate or OOB MSE that is provided internally by random forest. I would suggest that for all of the random forest examples, you use separate training/testing datasets, just like you’ve done in the previous examples.

carpentries-incubator / r-ml-tabular-data

_episodes_rmd/04-Decision-Forests.Rmd: Major text and code edit #16