Agglomerative Clustering without n_clusters

DTC is a deep clustering method, meaning that it aims at jointly optimizing the representation of data (via the autoencoder) and the clustering. Optimization is done with gradient descent (SGD) as usual in neural nets. If you look at the loss function, it is a combination of the autoencoder MSE and a KL-divergence clustering loss. For this reason, we need a clustering algorithm with parameters that can be optimized either by gradient descent (here we use a soft center-based clustering, similar to k-means but with a differentiable KL-divergence loss function), or use alternating optimization of the AE and the clustering (i.e. update only one parameter at a time).

Either way, I don't see how it could be used with agglomerative clustering because it has no straightforward loss function and cannot be optimized with SGD.

BUT you can of course use only the ConvLSTM autoencoder to first encode your data (using only reconstruction loss), and then apply agglomerative clustering on the latent representations, using any distance metric you like.

Concerning the heatmap, it is based on a supervised classification network so it needs to know the number of classes.

FlorentF9 / DeepTemporalClustering

Agglomerative Clustering without n_clusters #3