AntoineSoetewey / statsandr

A blog on statistics and R aiming at helping academics and professionals working with data to grasp important concepts in statistics and to apply them in R. See www.statsandr.com
http://statsandr.com/
35 stars 15 forks source link

blog/clustering-analysis-k-means-and-hierarchical-clustering-by-hand-and-in-r/ #21

Closed utterances-bot closed 3 years ago

utterances-bot commented 3 years ago

The complete guide to clustering analysis: k-means and hierarchical clustering by hand and in R - Stats and R

Learn how to perform clustering analysis, namely k-means and hierarchical clustering, by hand and in R. See also how the different clustering algorithms work

https://statsandr.com/blog/clustering-analysis-k-means-and-hierarchical-clustering-by-hand-and-in-r/

AntoineSoetewey commented 3 years ago

Comment written by royr2 on February 15, 2020 04:29:53:

Thanks for this amazing post ! Very clear, structured and pedagogical in nature. (Especially how you have repeated the same set of steps again and again for better assimilation).

Thanks Antoine, much appreciated !

AntoineSoetewey commented 3 years ago

Comment written by royr2 on February 15, 2020 04:29:53:

Thanks for this amazing post ! Very clear, structured and pedagogical in nature. (Especially how you have repeated the same set of steps again and again for better assimilation).

Thanks Antoine, much appreciated !

Comment written by Antoine Soetewey on February 15, 2020 06:02:53:

You are welcome ! Glad you liked it.

AntoineSoetewey commented 3 years ago

Comment written by Kristina on February 18, 2020 20:22:34:

Hi Antoine, Thank you for this really complete article ! 

I just read an article describing a two-step clustering, using hierarchical clustering first, and then a non hierarchical clustering using the "cluster means derived from the hierarchical clustering as starting point". They don't explain more, and I would really like to do this.

Nevertheless I don't know how to do it, what object do I have to use form the hclust and how ? Any suggestions ?

AntoineSoetewey commented 3 years ago

Comment written by Kristina on February 18, 2020 20:22:34:

Hi Antoine, Thank you for this really complete article ! 

I just read an article describing a two-step clustering, using hierarchical clustering first, and then a non hierarchical clustering using the "cluster means derived from the hierarchical clustering as starting point". They don't explain more, and I would really like to do this.

Nevertheless I don't know how to do it, what object do I have to use form the hclust and how ? Any suggestions ?

Comment written by Antoine Soetewey on February 19, 2020 12:38:50:

Hi Kristina, thanks for your comment.

First, final clusters from a hierarchical clustering can be extracted thanks to cutree(hclust, k = 2) where hclust is the result of your clustering and k = 2 is the number of desired clusters.

Second, perhaps the {prcr} package may be what you are looking for: "hierarchical clustering is performed to determine the initial partition for the subsequent k-means clustering procedure".

Hope this helps!

AntoineSoetewey commented 3 years ago

Comment written by Salman Alk on August 05, 2020 02:41:47:

very nice.
Thank you

AntoineSoetewey commented 3 years ago

Comment written by Salman Alk on August 05, 2020 02:41:47:

very nice. Thank you

Comment written by Antoine Soetewey on August 05, 2020 05:39:52:

Glad you like it Salman!

AntoineSoetewey commented 3 years ago

Comment written by kathroji saikrishna on September 26, 2020 06:09:24:

I am impressed by the information that you have on this blog. It shows how well you understand this subject.

AntoineSoetewey commented 3 years ago

Comment written by kathroji saikrishna on September 26, 2020 06:09:24:

I am impressed by the information that you have on this blog. It shows how well you understand this subject.

Comment written by Antoine Soetewey on September 26, 2020 06:18:54:

Thanks for your kind feedback. I always try to write articles as complete as possible.

johnsonlab commented 3 years ago

Hi Antoine - thanks for the really comprehensive dive into this topic. Here's my question: is there a statistical test that provides a relative measure of how distinct K means clusters are from one another? Answers to similar questions elsewhere seem to be suggesting that the production of the clusters is its own validation of the "distinctness" of the clusters, and no post-test would be informative for that reason. I can convince myself that this is correct, but want to be sure. Do you have any additional guidance? All the best, Josh

AntoineSoetewey commented 3 years ago

Hi Antoine - thanks for the really comprehensive dive into this topic. Here's my question: is there a statistical test that provides a relative measure of how distinct K means clusters are from one another? Answers to similar questions elsewhere seem to be suggesting that the production of the clusters is its own validation of the "distinctness" of the clusters, and no post-test would be informative for that reason. I can convince myself that this is correct, but want to be sure. Do you have any additional guidance? All the best, Josh

Dear Josh,

Thanks for this interesting question.

First of all, I am not aware of any statistical test that provides a relative measure of how distinct clusters are. The fact that clusters are constructed following the k-means algorithm makes them, by definition, as different/distinct as possible (since similar points are grouped together and distant points are separated into different clusters). That does not necessarily mean there is no statistical test, it's just that I don't know any.

However, here are a few points I'd like to mention and which may be of interest to you:

Hope this helps.

Regards, Antoine

johnsonlab commented 3 years ago

Antoine, this was super helpful, for me and also some local colleagues as we wrap up an analysis. Great site, 4/4 stars would comment again!

AntoineSoetewey commented 3 years ago

Antoine, this was super helpful, for me and also some local colleagues as we wrap up an analysis. Great site, 4/4 stars would comment again!

Thanks for your kind feedback!