fferroni / DEC-Keras

Deep Embedding Clustering in Keras
GNU General Public License v3.0
129 stars 59 forks source link

Question: Unknown number of clusters? #7

Open jolespin opened 6 years ago

jolespin commented 6 years ago

Do you know if there have been any usage with an unknown number of clusters? Thanks for posting this btw! It makes a lot more sense looking at the code than interpreting math symbols for me.

fferroni commented 6 years ago

There are a number of conventional clustering algorithms that do not require a known number of cluster, such as DBSCAN. However, it might be tricky to establish your epsilon parameter (in DBSCAN this is related to the acceptable maximum distance between two points in a cluster) especially if your latent space dimensions are changing. I’ve never tried though, so maybe you can tell me if it works!

One thing you could always do is decide the number of clusters after pre-training your auto-encoder - either via the elbow method if you’re using kmeans or visually perhaps if you have low enough dimensions.

On 25. Oct 2017, at 00:07, Josh L. Espinoza notifications@github.com wrote:

Do you know if there have been any usage with an unknown number of clusters? Thanks for posting this btw! It makes a lot more sense looking at the code than interpreting math symbols for me.

― You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

jolespin commented 6 years ago

Do you mean piping the autoencoder or tsne embeddings into dbscan? I just heard about autoencoders earlier in the week and DEC so i'm fairly new at their architecture.

On Oct 24, 2017, at 10:48 PM, Francesco Ferroni notifications@github.com wrote:

There are a number of conventional clustering algorithms that do not require a known number of cluster, such as DBSCAN. However, it might be tricky to establish your epsilon parameter (in DBSCAN this is related to the acceptable maximum distance between two points in a cluster) especially if your latent space dimensions are changing. I’ve never tried though, so maybe you can tell me if it works!

One thing you could always do is decide the number of clusters after pre-training your auto-encoder - either via the elbow method if you’re using kmeans or visually perhaps if you have low enough dimensions.

On 25. Oct 2017, at 00:07, Josh L. Espinoza notifications@github.com wrote:

Do you know if there have been any usage with an unknown number of clusters? Thanks for posting this btw! It makes a lot more sense looking at the code than interpreting math symbols for me.

― You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.