UBC-DSCI / introduction-to-datascience

Open Source Textbook for DSCI100: Introduction to Data Science in R
https://datasciencebook.ca/
Other
51 stars 56 forks source link

Feedback on regression chapters? #3

Closed ttimbers closed 3 years ago

ttimbers commented 5 years ago

@msalibian - I am looking for some feedback on the regression chapters for the DSCI 100 course notepack/textbook. I currently have these two chapters drafted:

The students have already read Chapter 8, but I can still make corrections and address them in class. Students will soon read Chapter 9, but same thing goes, I am happy to address gaps/errors as needed. Comments welcome in this issue thread or you can directly edit the following Rmd's:

ttimbers commented 5 years ago

Also, please note that I am trying to keep this notepack/textbook as accessible as possible and so my language around things is often intentionally informal. I am happy to change things however if you think it would be better in some cases.

msalibian commented 5 years ago

@ttimbers I found the version of Chapter 8 of the book here to be different from that in the Rmd file here, so I worked with the HTML version. Below are some suggestions for Chapter 8. I'll look at Chapter 9 later tonight, or maybe tomorrow (Wed). Congratulations on these notes. They are super valuable, and will be great resource to have in the Dept, not only for DSCI 100.

Matias comments / suggestions

8.3 Regression

8.5

8.6

8.7

8.9.2

  1. "Does not perform well with a large number of predictors unless the size of the training set is exponentially larger "
  2. I'm torn about this limitation because it is not specific to K-NN... but it is true nonetheless...
msalibian commented 5 years ago

@ttimbers Here's the rest of my comments. Once again, congratulations! this is a great set of notes / textbook.

A couple of additional comments on Chapter 8, section 8.6

Chapter 9

9.1

9.4

9.5

ttimbers commented 5 years ago

@msalibian - Thanks for the very helpful feedback! I really appreciate it! Can I please add you as an author/contributor to credit you for your contributions? You also gave much feedback on Melissa's classification chapters, so I think it is well deserved.

I have addressed all the feedback for chapter 9, and the smaller changes for chapter 8. I will keep this issue open and loop back and address the bigger changes you suggest (which I paste below to remind myself of what I have left to do) once I have this week's worksheet and lecture slides done:

Instead of using 2000 sqf as the first example, I would start with 1250 sqf, where you have a few observations either on x = 1250, or almost on it. Then, intuitively, one'd say that the price should be around $150K, since the y's are all around that value. We can then suggest taking the average of these values. I say this because for x = 2000 sqf there aren't any observations, on x = 2000, and then we need to borrow from neighbours "farther away", and since half of them are noticeably lower and some are noticeably higher, taking the average of them may not be that intuitive to all the students, whereas if they are all closer to each other (as they are for x = 1250), then it may feel more natural to average them?

I would then end the section showing predictions (using 5-NN on the whole data set) for a grid of square footage values, say seq(500, 5000, by=100) or something like that, before moving onto 8.6 to assess these predictions, for example.

8.6

Finish the section showing the predictions with k = 5 and also those with the optimal k = 51 for the same grid used at the end of 8.5, for example.

msalibian commented 5 years ago

@ttimbers No need to give me credit for these off-the-cuff comments and suggestions!

ttimbers commented 3 years ago

Thanks again @msalibian for this feedback, most/all has been used to improve this chapter!