UBC-DSCI / introduction-to-datascience-python

Open Source Textbook for DSCI100: Introduction to Data Science in Python
https://python.datasciencebook.ca
Other
12 stars 9 forks source link

Clustering: mimic new R chapter where we tune num clusters #213

Open trevorcampbell opened 1 year ago

trevorcampbell commented 1 year ago

Right now in the py version of the book, we tune the number of clusters manually (we run a pipeline for each $k$, manually extract results, plot). This was closer to the old version of the R book. New version of the R book uses tidyclust, which is more aligned with the classification/regression chapters in its tuning method.

Is there a similar update we can make to the py book?

Make sure to propagate this change to the worksheets if we do this.

joelostblom commented 1 year ago

I had the same thought and looked at this briefly. Based on what I found, I don't think it is easily possible. See https://github.com/scikit-learn/scikit-learn/issues/6154 for details. There are some workaround suggested on SO, but nothing convenient