when to normalize by forbenius norm?

brando90 commented 3 years ago

2 Problem Setup: Metrics and Models
Our goal is to quantify the similarity between two different groups of neurons (usually layers). We
do this by comparing how their activations behave on the same dataset. Thus for a layer with p1
neurons, we define A ∈ R
p1×n, the matrix of activations of the p1 neurons on n data points, to be
that layer’s raw representation of the data. Similarly, let B ∈ R
p2×n be a matrix of the activations
of p2 neurons on the same n data points. We center and normalize these representations before
computing dissimilarity, per standard practice. Specifically, for a raw representation A we first
subtract the mean value from each column, then divide by the Frobenius norm, to produce the
normalized representation A∗
, used in all our dissimilarity computations. In this work we study
dissimilarity measures d(A∗
, B∗
) that allow for quantitative comparisons of representations both
within and across different networks. We colloquially refer to values of d(A∗
, B∗
) as distances,
although they do not necessarily satisfy the triangle inequality required of a proper metric.
We study five dissimilarity measures: centered kernel alignment (CKA), three measures derived
from canonical correlation analysis (CCA), and a measure derived from the orthogonal Procrustes
problem. As argued in Kornblith et al. [11], similarity measures should be invariant to left orthogonal
transformations to accommodate the symmetries of neural networks, and all five measures satisfy this
requirement.

brando90 commented 3 years ago

ref: https://arxiv.org/pdf/2108.01661.pdf

Grounding Representation Similarity with Statistical Testing

brando90 commented 3 years ago

for now as long as centering is done correctly, sanity checks pass

brando90 commented 3 years ago

decision:

for OPD divide by forbenius norm of the centered data so normlizing (input to opd) is: X_centered/||X_centered||_F
CCA is only centered since it already has it's division by variance so it's own normalization. Dividing by forbenius norm sort of maps to unit circle instead of "containing you in the unit circle" so I predict || . ||_F division will make things more similar artificially in general.
CKA, only centering since it already has a division by Frobenius in it's definition.

Always center! Careful with division of forbenius and favour division of std if possible.

brando90 / ultimate-anatome

when to normalize by forbenius norm? #4