Works fine with the whole model but raise "NANs" on selected layers.

AntixK / PyTorch-Model-Compare

Compare neural networks by their feature similarity

MIT License

346 stars 37 forks source link

Works fine with the whole model but raise "NANs" on selected layers. #4

Open PengHongyiNTU opened 2 years ago

PengHongyiNTU commented 2 years ago

When I was trying to compare the same model trained on different datasets, I encountered a weird problem:

It works fine when I compare all layers: cka = CKA(model1, model2, device='cuda', model1_name='model1', model2_name='model2')

But, when I try to compare a selected subset of layers: cka = CKA(model1, model2, device='cuda', model1_name='model1', model2_name='model2', model1_layers=list(model1.state_dict().keys())[:5], model2_layers=list(model2.state_dict().keys())[:5]) It raises:

HSIC computation resulted in NANs

Do you have any idea how to fix this? Thank you very much.

kssteven418 commented 2 years ago

In my case, the NaNs issue was raised when there was no hook created (here), in which case no feature would return here, resulting in the divided-by-zero error at L180. It might be helpful to see if the hooks are being generated properly.

Maddy12 commented 2 years ago

I think an issue I am having is no matter what, in line 134 ((N - 1) * (N - 2)) ends up equalling 0 because the matrix being passes is always a size 2x2 from around line 181 K = X @ X.t().

I discovered this happens when the batch size is <= 2. So if anyone else has this issue this might be why!

I set this batch size because this method is so slow and memory consuming. Are there any tricks to it without using large batch sizes or computation?

bryanbocao commented 1 year ago

I got the same issue: https://github.com/AntixK/PyTorch-Model-Compare/issues/10. Looking for solutions. Thanks!