Closed trevorcampbell closed 1 year ago
@chendaniely found this and mentioned it to me -- copying the thread here. I'm very much in favour of testing this out thoroughly and possibly replacing our clustering material with this to make the book consistent / cleaner.
We can look into doing eveyrthing within tidymodels now: https://www.tidyverse.org/blog/2022/12/tidyclust-0-1-0/
For the clustering slides + worksheet + tutorial
example code from the post:
kmeans_spec <- k_means(num_clusters = 4) %>% set_engine("ClusterR") kmeans_spec #> K Means Cluster Specification (partition) #> #> Main Arguments: #> num_clusters = 4 #> #> Computational engine: ClusterR data("ames", package = "modeldata") rec_spec <- recipe(~ ., data = ames) %>% step_dummy(all_nominal_predictors()) %>% step_zv(all_predictors()) %>% step_normalize(all_numeric_predictors()) %>% step_pca(all_numeric_predictors(), threshold = 0.8) kmeans_wf <- workflow(rec_spec, kmeans_spec) kmeans_fit <- fit(kmeans_wf, data = ames) kmeans_fit #> ══ Workflow [trained] ══════════════════════════════════════════════════════════ #> Preprocessor: Recipe #> Model: k_means() #> #> ── Preprocessor ──────────────────────────────────────────────────────────────── #> 4 Recipe Steps #> #> • step_dummy() #> • step_zv() #> • step_normalize() #> • step_pca() #> #> ── Model ─────────────────────────────────────────────────────────────────────── #> KMeans Cluster #> Call: ClusterR::KMeans_rcpp(data = data, clusters = clusters) #> Data cols: 121 #> Centroids: 4 #> BSS/SS: 0.1003306 #> SS: 646321.6 = 581475.8 (WSS) + 64845.81 (BSS) extract_cluster_assignment(kmeans_fit) #> # A tibble: 2,930 × 1 #> .cluster #> <fct> #> 1 Cluster_1 #> 2 Cluster_1 #> 3 Cluster_1 #> 4 Cluster_1 #> 5 Cluster_2 #> 6 Cluster_2 #> 7 Cluster_2 #> 8 Cluster_2 #> 9 Cluster_2 #> 10 Cluster_2 #> # … with 2,920 more rows predict(kmeans_fit, new_data = slice_sample(ames, n = 10)) #> # A tibble: 10 × 1 #> .pred_cluster #> <fct> #> 1 Cluster_4 #> 2 Cluster_2 #> 3 Cluster_4 #> 4 Cluster_3 #> 5 Cluster_1 #> 6 Cluster_4 #> 7 Cluster_2 #> 8 Cluster_2 #> 9 Cluster_1 #> 10 Cluster_4
@chendaniely found this and mentioned it to me -- copying the thread here. I'm very much in favour of testing this out thoroughly and possibly replacing our clustering material with this to make the book consistent / cleaner.
We can look into doing eveyrthing within tidymodels now: https://www.tidyverse.org/blog/2022/12/tidyclust-0-1-0/
For the clustering slides + worksheet + tutorial
example code from the post: