[x] The text could be misread to imply regression is only an important part of prediction or that prediction is only about the temporal future. Regression is an important part of inference also; and prediction can be used in non-forecasting contexts (e.g. interpolating missing data, predicting counterfactual outcomes, etc.) so it might be good to clarify the language
Consider clarifying how KNN “voting” works for regression at the beginning of 8.5. I can imagine students being confused until that part is reached later on This is explained in this section, so ignoring.
[x] It might be worth discussing more subjectively how good or bad 80K RMSE is given the overall range of the target and what one might do with housing predictions. I think learning how to form these qualitative opinions is an important part of stats
[x] While I don’t have a crystal ball, I think students are going to need to learn the concept of model overfitting earlier and earlier. Great addition.
[x] kNN not being taught at an intro level must only be another legacy of the past, probably due to a lack of student access to cheap computing.
Can the authors be clear on what the actual workflow is here and whether it matches the workflow used for classification? If it does, the authors can just elaborate on the parts of the workflow that need to be adjusted from a classification setting to a regression setting? Too big of a change - also we are not confident such a change would improve the book.
Can the authors also clarify whether K-nearest neighbours allows for non-linearity of relationships between predictors and even for interactions between predictors? Non-linear relationships being possible is stated in the strengths section and we don't want to get into what interactions are in this book - that is too advanced for our audience.
Can the authors comment in the manuscript on whether Underfitting and overfitting is something to be concerned about in the context of classification (not just regression) We now cover this in classification.
[x] In the Classification chapters, the authors argued that a classifier should be “evaluated” (in order to be tuned) on the validation set and its accuracy should be assessed on the test set. However, in the Regression I chapter, the authors state that one should be evaluating (a regression model) on the test set. This seems both confusing and inconsistent. Can the authors use consistent workflows in the Classification and Regression chapters?
p195 typo: "matches does not match" We already fixed this in a previous pass.
[x] p196 bottom: Add that comparison for your example here to fully demonstrate your suggestion
[x] p198: the limitations list is not formatted properly (needs an extra bullet pt)
[x] p201: This is the plot from an OLS model, it needs to be updated to the surface from KNN regression. Your online version has the correct graphic
[x] I feel like this chapter lacks a wrap up after this last section. It seemed like it could naturally end with the strengths/weaknesses before. I do like the multiple predictor discussion, but then it just ends.
Reviewer E:
Consider clarifying how KNN “voting” works for regression at the beginning of 8.5. I can imagine students being confused until that part is reached later onThis is explained in this section, so ignoring.