utterances-bot commented 3 years ago

The complete guide to clustering analysis: k-means and hierarchical clustering by hand and in R - Stats and R

Learn how to perform clustering analysis, namely k-means and hierarchical clustering, by hand and in R. See also how the different clustering algorithms work

https://statsandr.com/blog/clustering-analysis-k-means-and-hierarchical-clustering-by-hand-and-in-r/

AntoineSoetewey commented 3 years ago

Comment written by royr2 on February 15, 2020 04:29:53:

Thanks for this amazing post ! Very clear, structured and pedagogical in nature. (Especially how you have repeated the same set of steps again and again for better assimilation).

Thanks Antoine, much appreciated !

AntoineSoetewey commented 3 years ago

Comment written by royr2 on February 15, 2020 04:29:53:

Thanks for this amazing post ! Very clear, structured and pedagogical in nature. (Especially how you have repeated the same set of steps again and again for better assimilation).

Thanks Antoine, much appreciated !

Comment written by Antoine Soetewey on February 15, 2020 06:02:53:

You are welcome ! Glad you liked it.

AntoineSoetewey commented 3 years ago

Comment written by Kristina on February 18, 2020 20:22:34:

Hi Antoine, Thank you for this really complete article !

I just read an article describing a two-step clustering, using hierarchical clustering first, and then a non hierarchical clustering using the "cluster means derived from the hierarchical clustering as starting point". They don't explain more, and I would really like to do this.

Nevertheless I don't know how to do it, what object do I have to use form the hclust and how ? Any suggestions ?

AntoineSoetewey commented 3 years ago

Comment written by Kristina on February 18, 2020 20:22:34:

Hi Antoine, Thank you for this really complete article !

I just read an article describing a two-step clustering, using hierarchical clustering first, and then a non hierarchical clustering using the "cluster means derived from the hierarchical clustering as starting point". They don't explain more, and I would really like to do this.

Nevertheless I don't know how to do it, what object do I have to use form the hclust and how ? Any suggestions ?

Comment written by Antoine Soetewey on February 19, 2020 12:38:50:

Hi Kristina, thanks for your comment.

First, final clusters from a hierarchical clustering can be extracted thanks to cutree(hclust, k = 2) where hclust is the result of your clustering and k = 2 is the number of desired clusters.

Second, perhaps the {prcr} package may be what you are looking for: "hierarchical clustering is performed to determine the initial partition for the subsequent k-means clustering procedure".

Hope this helps!

AntoineSoetewey commented 3 years ago

Comment written by Salman Alk on August 05, 2020 02:41:47:

very nice.
Thank you

AntoineSoetewey commented 3 years ago

Comment written by Salman Alk on August 05, 2020 02:41:47:

very nice. Thank you

Comment written by Antoine Soetewey on August 05, 2020 05:39:52:

Glad you like it Salman!

AntoineSoetewey commented 3 years ago

Comment written by kathroji saikrishna on September 26, 2020 06:09:24:

I am impressed by the information that you have on this blog. It shows how well you understand this subject.

AntoineSoetewey commented 3 years ago

Comment written by kathroji saikrishna on September 26, 2020 06:09:24:

I am impressed by the information that you have on this blog. It shows how well you understand this subject.

Comment written by Antoine Soetewey on September 26, 2020 06:18:54:

Thanks for your kind feedback. I always try to write articles as complete as possible.

johnsonlab commented 3 years ago

Hi Antoine - thanks for the really comprehensive dive into this topic. Here's my question: is there a statistical test that provides a relative measure of how distinct K means clusters are from one another? Answers to similar questions elsewhere seem to be suggesting that the production of the clusters is its own validation of the "distinctness" of the clusters, and no post-test would be informative for that reason. I can convince myself that this is correct, but want to be sure. Do you have any additional guidance? All the best, Josh

AntoineSoetewey commented 3 years ago

Hi Antoine - thanks for the really comprehensive dive into this topic. Here's my question: is there a statistical test that provides a relative measure of how distinct K means clusters are from one another? Answers to similar questions elsewhere seem to be suggesting that the production of the clusters is its own validation of the "distinctness" of the clusters, and no post-test would be informative for that reason. I can convince myself that this is correct, but want to be sure. Do you have any additional guidance? All the best, Josh

Dear Josh,

Thanks for this interesting question.

First of all, I am not aware of any statistical test that provides a relative measure of how distinct clusters are. The fact that clusters are constructed following the k-means algorithm makes them, by definition, as different/distinct as possible (since similar points are grouped together and distant points are separated into different clusters). That does not necessarily mean there is no statistical test, it's just that I don't know any.

However, here are a few points I'd like to mention and which may be of interest to you:

Thanks to your comment, I added the silhouette plot in this section. The two plots mentioned in this section can help you to (visually at least) determine how distinct clusters are.
You can also compute the euclidian distance between points and their center. If these distances are small, it indicates that points are close to their center and clusters are thus more likely to be distinct.
If you really need a statistical test (and if you don't find any online or in textbooks), you can also compare the final groups via a Student's t-test in case of 2 clusters, or an ANOVA in case of 3 groups or more. These statistical tests will not tell you how distinct the clusters are, but they will tell you if the clusters are significantly different from each other or not.

Hope this helps.

Regards, Antoine

johnsonlab commented 3 years ago

Antoine, this was super helpful, for me and also some local colleagues as we wrap up an analysis. Great site, 4/4 stars would comment again!

AntoineSoetewey commented 3 years ago

Antoine, this was super helpful, for me and also some local colleagues as we wrap up an analysis. Great site, 4/4 stars would comment again!

Thanks for your kind feedback!

AntoineSoetewey / statsandr

blog/clustering-analysis-k-means-and-hierarchical-clustering-by-hand-and-in-r/ #21

The complete guide to clustering analysis: k-means and hierarchical clustering by hand and in R - Stats and R