choasma / HSIC-bottleneck

The HSIC Bottleneck: Deep Learning without Back-Propagation
https://arxiv.org/abs/1908.01580
MIT License
81 stars 17 forks source link

Question on normalized HSIC calculating #9

Closed lizhenstat closed 3 years ago

lizhenstat commented 3 years ago

Hi, thanks for your work! It's very interesting. I have one question related with normalized HSIC calculating in hsic.py, the normalized hsic is calculated as

def hsic_normalized(x, y, sigma=None, use_cuda=True, to_numpy=True):
    m = int(x.size()[0])
    Pxy = hsic_regular(x, y, sigma, use_cuda)
    Px = torch.sqrt(hsic_regular(x, x, sigma, use_cuda))
    Py = torch.sqrt(hsic_regular(y, y, sigma, use_cuda))
    thehsic = Pxy/(Px*Py)
    return thehsic

It seems like this equation is from the CKA(Centered Kernel Alignment) (from paper【Similarity of Neural Network Representations Revisited】) CKA

not from the equation(5) in the paper

equation-5

Am I understanding right? Thanks in advance

choasma commented 3 years ago

Yes, you're right. We implemented both approaches in order to compute the normalized HSIC. however, we are actually using hsic_normalized_cca (link) as the alternative in our training pipeline, which is invoked here (link).

The reason is we had a hard time to investigate a good parameters in the the equation you posted, and switch to the cca version (canonical correlation analysis). Thanks for visiting our repo! Please feel free to ask any questions

lizhenstat commented 3 years ago

@choasma Oh, I got your point! Thanks for your detailed and quick reply, thanks a lot!

choasma commented 3 years ago

@lizhenstat No worries! Feel free to make issue if there's any ambiguous!

spdj2271 commented 6 months ago
def hsic_regular(x, y, sigma=None, use_cuda=True, to_numpy=False):
    Kxc = kernelmat(x, sigma)
    Kyc = kernelmat(y, sigma)
    KtK = torch.mul(Kxc, Kyc.t())
    Pxy = torch.mean(KtK)  # Is it torch.trace(KtK)?
    return Pxy

Hi! Thanks for your interesting work~ I have a query regarding the computation of HSIC. You're employing 'Pxy = torch.mean(KtK)' to calculate HSIC, but according to the definition of $HSIC(X,Y)=tr(K_XK_Y)$, it should be 'Pxy = torch.trace(KtK)'. Am I understanding this correctly?