daviddalpiaz / r4sl

:chart_with_upwards_trend: Machine Learning from the perspective of a Statistician using R
https://daviddalpiaz.github.io/r4sl/
72 stars 62 forks source link

caret model$results outputs SD not se, right? #19

Open ttimbers opened 4 years ago

ttimbers commented 4 years ago

Here you take RMSESD from the results attribute of a train (model) object and then refer to this as se and standard error in the text. However, SD is commonly refers to standard deviation. I think something has to be done to calculate standard error from standard deviation. Or is the output of caret misleading?

daviddalpiaz commented 4 years ago

That's likely just sloppiness on my part. I believe technically it is the sample standard deviation of the RMSE from each fold.

Although, I suppose on some level, it could be viewed as an estimate of the standard deviation of the RMSE when the model is applied to new data? So in that sense, it's a standard error of the generalization error? I think the caret authors agree with this interpretation. They flip flop using SD and SE in the documentation for the oneSE selection function here: https://www.rdocumentation.org/packages/caret/versions/6.0-84/topics/oneSE

I actually have a note to myself from a previous semester to figure out a better way to explain this interchange to my students. I like using caret which outputs things with the SD label, but I also like teaching the one-standard-error rule that is mentioned in ISLR and implemented in caret. This is a pretty timely issue as I start thinking about this for the new semester.

Somewhat unrelated and out of curiosity, how did you come across R4SL? It's a very unfinished project of mine, that I don't really tell anyone about except for my UIUC students. (So I'm always curious how it gets discovered.) I'm also in the process of starting to decommission it and replace it with this: https://github.com/daviddalpiaz/bsl (Mostly so I can sort of get a fresh start on it.)