AntixK / PyTorch-Model-Compare

Compare neural networks by their feature similarity
MIT License
346 stars 37 forks source link

getting spurious "HSIC computation resulted in NANs" #2

Open jprobichaud opened 2 years ago

jprobichaud commented 2 years ago

Thanks for this great little module! I was able to adapt the code to deal with models suitable for speech recognition (mostly transformers and conformers) and I'm learning a lot from the CKA outputs.

One problem I face is that for some models, some layers hit this assert after a certain number of batches. Basically, if I try to pass 300 batches of 32 through the model, I end up with NaN exception around 150 or so. It doesn't seem related to the data because I shuffle the data and get the same exception after the same number of batches.

I guess this is a numerical stability problem perhaps. Is there some assumptions about the range of the layer features and outputs?

AntixK commented 2 years ago

Hi, I am glad that you find this code useful.

Yes, the numerical issues occur mostly due to the features of the two models having different scales. I usually tackle the issue in 2 ways 1) Simply normalize the features (or the autocorrelation matrices) before calculating HSIC 2) Compare only the outputs of normalization layers of the networks. They usually will be well-behaved as they are crucial for network stability.

Let me know if any of these approaches work for you. Also, is your input data normalized in any way?

jprobichaud commented 2 years ago

Thank you for you rapid reply and the advices, I'll give it a shot!

The raw data is audio files, and in my case, the inputs are filter banks and they are being normalized through mean and variance.

phrasenmaeher commented 2 years ago

I have used your code to calculate CKA matrices, and am also facing the hard-to-trace NAN error. As far as feature/matrix normalization is contained, I have tried the following two approaches:

from sklearn.preprocessing import normalize
import numpy as np

def _normalize_matrix(matrix):
        norms = np.linalg.norm(matrix, axis=1)
        matrix /= norms
        return matrix

        # row normalization, axis=0 is column
        # return normalize(matrix, axis=1, norm="l1")

In my experience, it worked best when using the first approach, the number of NANs was strongly reduced. Anyway, are there approaches you would recommend for normalization?

mahdibeit commented 2 years ago

Yeah. I am having the same issue unfortunately. CKA for image networks works flawlessly. However, for the speech dataset that I am trying, HSIC returns NaN values. I tried normalizing the features or the autocorrelation matrices but it did not work.

Maddy12 commented 2 years ago

I am also having the same problem! and normalizing is not helping, but maybe I am normalizing the wrong thing? I amnormalizing the X and Y matrices before being pased to the _HISC method.

bryanbocao commented 1 year ago

I got the same issue: https://github.com/AntixK/PyTorch-Model-Compare/issues/10. Looking for solutions. Thanks!