carpentries-incubator / r-ml-tabular-data

A Data-Carpentry-style lesson on some ML techniques in R
https://carpentries-incubator.github.io/r-ml-tabular-data/
Other
3 stars 0 forks source link

_episodes_rmd/03-Decision-Trees.Rmd: Text edit #11

Closed gmcdonald-sfg closed 2 years ago

gmcdonald-sfg commented 2 years ago

Suggest modifying the language in the “Sensitivity” section. Sensitivity is a very specific term in ML relating to a certain type of model performance metric. I would instead use this section as an opportunity to talk about overfitting, which is a really important concept in ML, and a nice segue to how random forests can help mitigate this problem. Here’s a nice explanation: https://towardsdatascience.com/random-forests-and-the-bias-variance-tradeoff-3b77fee339b4

djhunter commented 2 years ago

"Sensitivity" is a term from statistics, so I'm using it in that sense. (The ML people might also use the term in a different context). I was intentionally avoiding using the terms "bias" and "variance" because I was afraid that defining them would be information overload. Section 2.2.2 of https://hastie.su.domains/ISLR2/ISLRv2_website.pdf has a nice explanation. Also, Bias/Variance is not exactly the same thing as underfitting/overfitting, especially when using XGBoost. There's a nice discussion here: https://stats.stackexchange.com/questions/204489/discussion-about-overfit-in-xgboost

Maybe there's another term I could use besides sensitivity? But my inclination is to leave it as is.

djhunter commented 2 years ago

Maybe I'll add some comments to the instructor notes.

djhunter commented 2 years ago

Changed terminology in lesson to "non-robustness", and added to instructor notes.