UBC-DSCI / introduction-to-datascience

Open Source Textbook for DSCI100: Introduction to Data Science in R
https://datasciencebook.ca/
Other
50 stars 54 forks source link

Possibly wrong sizes in under/overfitting section in Reg1 #474

Closed trevorcampbell closed 1 year ago

trevorcampbell commented 1 year ago

Not sure if we caught this already, but when preparing the python version of Ch7, Gloria found:

In the [Section 7.7 Underfitting and overfitting](https://github.com/UBC-DSCI/introduction-to-> datascience/blob/master/regression1.Rmd#L500), the textbook says to fit a KNN regression model with neighbors=932 (the size of the entire dataset); however, in the code, fitting was done on only the training split (size=699). It will cause an error if we really use neighbors=932 since training sample size is smaller than neighbors. (The R textbook escapes this error because it uses an if/else statement and does not actually fit neighbors=932, instead it takes the mean of all training samples, which is equivalent to using neighbors=699 where 699 is the training sample size). Should I keep saying 932 and do the same thing for Python textbook?

trevorcampbell commented 1 year ago

This was fixed at some point in the past. Closing.