Neuralized K-Means: make k-means amenable to neural network explanations

Description

K-Means finds some centroids $\mathbf{\mu}_1,\ldots,\mathbf{\mu}_K$ and assigns data points to a cluster as $c = {\rm argmin}_k \lbrace \Vert\boldsymbol{x} - \mathbf{\mu}_k\Vert^2\rbrace$, or in code:

c = torch.argmin(torch.cdist(x, centroids)**2)

Neither the assignment $c$ nor the distance $\Vert\boldsymbol{x} - \mathbf{\mu}_c\Vert^2$ are really explainable. The assignment $c$ is not continuous, and the distance is in fact measuring a dissimilarity to the cluster, essentially the opposite of what we want to explain.

Fixes

From Clustering to Cluster Explanation via Neural Networks (2022) present a method based on neuralization. They show that the k-means cluster assignment can be exactly rewritten as a set of piecewise linear functions

f_c(\boldsymbol{x}) = \min_{k\neq c}\lbrace \boldsymbol{w}_{ck}^\top\boldsymbol{x} + b_{ck}\rbrace

and the original k-means cluster assignment can be recovered as $c = {\rm argmax}_k\lbrace f_k(\boldsymbol{x})\rbrace$.

The cluster discriminant $f_c$ is a measure of cluster membership (instead of a dissimilarity) and is also structurally amenable to LRP and friends. It can also be plugged on top of a neural network feature extractor to make deep cluster assignments explainable.

Additional Information

There is no pairwise distance layer in PyTorch. Since zennit canonizers apply to torch.nn.Module modules, such module would need to be added
Each discriminant $f_c$ has it's own weight matrix $W\in\mathbb{R}^{(K-1)\times D}$. In principle, a neuralized k-means layer could be written as a layer
```
torch.nn.Sequential(
torch.nn.Linear(in_features=D, out_features=K*(K-1)), 
torch.nn.Unflatten(1, torch.Size([K, K-1]))
)
```
or alternatively via torch.einsum, which I think is easier. Either way, a special layer is needed for that.
The paper suggests a specific LRP rule for the $\min_{k\neq c}\lbrace s_k\rbrace$ to make the explanations continuous. In practice, we can replace it by $\beta^{-1}{\rm LogMeanExp}_k\lbrace \beta\cdot s_k\rbrace$ which approximates the min for $\beta < 0$, but has continuous gradients. Gradient propagation through this layer recovers exactly the LRP rule from the paper, but this also provides explanation continuity for other gradient-based explanation methods. I would suggest to implement such a LogMeanExpPool layer. Alternative could be to implement a MinPool layer with a custom LRP rule.
A canonizer could be used to replace the distance/k-means layer by it's neuralized counterpart. Alternatively, a composite might also work, but it seems a bit more complicated (where are the weights computed? how to set LRP rules for the linear/einsum layer?)

chr5tphr / zennit