brando90 / ultimate-anatome

Ἀνατομή is a PyTorch library to analyze representation of neural networks
MIT License
11 stars 1 forks source link

when to normalize by forbenius norm? #4

Closed brando90 closed 3 years ago

brando90 commented 3 years ago
2 Problem Setup: Metrics and Models
Our goal is to quantify the similarity between two different groups of neurons (usually layers). We
do this by comparing how their activations behave on the same dataset. Thus for a layer with p1
neurons, we define A ∈ R
p1×n, the matrix of activations of the p1 neurons on n data points, to be
that layer’s raw representation of the data. Similarly, let B ∈ R
p2×n be a matrix of the activations
of p2 neurons on the same n data points. We center and normalize these representations before
computing dissimilarity, per standard practice. Specifically, for a raw representation A we first
subtract the mean value from each column, then divide by the Frobenius norm, to produce the
normalized representation A∗
, used in all our dissimilarity computations. In this work we study
dissimilarity measures d(A∗
, B∗
) that allow for quantitative comparisons of representations both
within and across different networks. We colloquially refer to values of d(A∗
, B∗
) as distances,
although they do not necessarily satisfy the triangle inequality required of a proper metric.
We study five dissimilarity measures: centered kernel alignment (CKA), three measures derived
from canonical correlation analysis (CCA), and a measure derived from the orthogonal Procrustes
problem. As argued in Kornblith et al. [11], similarity measures should be invariant to left orthogonal
transformations to accommodate the symmetries of neural networks, and all five measures satisfy this
requirement.
brando90 commented 3 years ago

ref: https://arxiv.org/pdf/2108.01661.pdf

Grounding Representation Similarity with Statistical Testing

brando90 commented 3 years ago

for now as long as centering is done correctly, sanity checks pass

brando90 commented 3 years ago

decision:

Always center! Careful with division of forbenius and favour division of std if possible.