Open jprobichaud opened 2 years ago
Hi, I am glad that you find this code useful.
Yes, the numerical issues occur mostly due to the features of the two models having different scales. I usually tackle the issue in 2 ways 1) Simply normalize the features (or the autocorrelation matrices) before calculating HSIC 2) Compare only the outputs of normalization layers of the networks. They usually will be well-behaved as they are crucial for network stability.
Let me know if any of these approaches work for you. Also, is your input data normalized in any way?
Thank you for you rapid reply and the advices, I'll give it a shot!
The raw data is audio files, and in my case, the inputs are filter banks and they are being normalized through mean and variance.
I have used your code to calculate CKA matrices, and am also facing the hard-to-trace NAN error. As far as feature/matrix normalization is contained, I have tried the following two approaches:
from sklearn.preprocessing import normalize
import numpy as np
def _normalize_matrix(matrix):
norms = np.linalg.norm(matrix, axis=1)
matrix /= norms
return matrix
# row normalization, axis=0 is column
# return normalize(matrix, axis=1, norm="l1")
In my experience, it worked best when using the first approach, the number of NANs was strongly reduced. Anyway, are there approaches you would recommend for normalization?
Yeah. I am having the same issue unfortunately. CKA for image networks works flawlessly. However, for the speech dataset that I am trying, HSIC returns NaN values. I tried normalizing the features or the autocorrelation matrices but it did not work.
I am also having the same problem! and normalizing is not helping, but maybe I am normalizing the wrong thing? I amnormalizing the X and Y matrices before being pased to the _HISC
method.
I got the same issue: https://github.com/AntixK/PyTorch-Model-Compare/issues/10. Looking for solutions. Thanks!
Thanks for this great little module! I was able to adapt the code to deal with models suitable for speech recognition (mostly transformers and conformers) and I'm learning a lot from the CKA outputs.
One problem I face is that for some models, some layers hit this assert after a certain number of batches. Basically, if I try to pass 300 batches of 32 through the model, I end up with NaN exception around 150 or so. It doesn't seem related to the data because I shuffle the data and get the same exception after the same number of batches.
I guess this is a numerical stability problem perhaps. Is there some assumptions about the range of the layer features and outputs?