UBC-DSCI / introduction-to-datascience

Open Source Textbook for DSCI100: Introduction to Data Science in R
https://datasciencebook.ca/
Other
51 stars 56 forks source link

Review: Ch 10 (clustering) #110

Closed leem44 closed 3 years ago

leem44 commented 3 years ago

Reviewer E:

leem44 commented 3 years ago

Reviewer B:

trevorcampbell commented 3 years ago

Reviewer D

trevorcampbell commented 3 years ago

Reviewer A

trevorcampbell commented 3 years ago
ttimbers commented 3 years ago

Reviewer E suggests this:

Why is the elbow method done manually instead of in the tidymodels framework? Example here: https://www.tidymodels.org/learn/statistics/k-means/#clustering-in-r . It may be good to make students work to internalize formula but if so should at least highlight other option for long term

However, in reviewing that link, from what I can tell, they do the same thing we do in our chapter (create the elbow method plot from a data frame that was created using map and using the broom functions), and so I am going to ignore this.

ttimbers commented 3 years ago

Reviewer B suggests this:

Clustering by itself seems a little out of place, but I think more-than-zero exposure to unsupervised learning is called for. Again, I understand that book bloat is a thing, but if you were desperate for another unsupervised learning topic, a 2D principal components analysis is IMO accessible to undergrads, especially if you take a visual approach.

I disagree - I think Kmeans is an accessible example of how to answer an exploratory question with a clustering approach. We also only give one example of one method in classification - KNN, which I think is also fine in such an introductory book. I think a 2D principal components analysis is IMO accessible to certain undergrads, but not our target audience, the ones I think that would work for would have a stronger math background.

What I can do is list some other examples of clustering, for example hierarchical clustering. And then add ISLR as an additional resource to explore this?

trevorcampbell commented 3 years ago

@ttimbers I agree no PCA. Re hierarchical clustering, PCA, etc, just add those examples in the additional resources section -- I don't know if i'd clutter the main chapter text with the intricacies of hierarchical clustering et al (we do that in other chapters too IIRC)

You could also mention that there are other kinds of unsupervised in the chapter text at the beginning but then pt the reader to the additinoal resources (other chapters do that too I think)

ttimbers commented 3 years ago

Review E suggests:

"Unsupervised learning is often quite hard in practice and may not always find meaningful results. It could be useful in this section especially to show an example of the method struggling or failing"

However, we are short on time. Thinking this could be good to add for version 2 of the book? @trevorcampbell @leem44 - are you two OK with that?

trevorcampbell commented 3 years ago

agree!

ttimbers commented 3 years ago

Closed via #250