TTN-YKK / Clustering_friendly_representation_learning

Other
53 stars 9 forks source link

About the problem that the formula in the paper does not correspond well with the actual code #2

Open Li-Hyn opened 2 years ago

Li-Hyn commented 2 years ago

Hello Author! Thank you for your open-source and sharing the ideas presented in the paper very helpful! After reading your paper carefully, I can't seem to correspond the loss part of the paper to the code very well, especially the L_{FO} loss proposed in the paper. In fact, I am very interested in the feature regularization part, could you please elaborate on this part? As far as the code is concerned, it is hard for me to see more details. The loss function looks like this:

L{F}=-\sum{l=1}^{d} \log Q(l \mid \boldsymbol{f})=\sum{l=1}^{d}\left(-\boldsymbol{f}{l}^{T} \boldsymbol{f}{l} / \tau{2}+\log \sum{j}^{d} \exp \left(\boldsymbol{f}{j}^{T} \boldsymbol{f}{l} / \tau{2}\right)\right) The corresponding code looks like this: `class Loss(nn.Module): def init(self, tau2): super().init() self.tau2 = tau2

def forward(self, x, ff, y):
    # features = norm(net(inputs))
    # outputs = npc(features, indexes)
    # loss_id, loss_fd = loss(outputs, features, indexes)
    # F.cross_entropy(input, target)
    L_id = F.cross_entropy(x, y)

    norm_ff = ff / (ff**2).sum(0, keepdim=True).sqrt()
    coef_mat = torch.mm(norm_ff.t(), norm_ff)
    coef_mat.div_(self.tau2)
    a = torch.arange(coef_mat.size(0), device=coef_mat.device)
    L_fd = F.cross_entropy(coef_mat, a)
    return L_id, L_fd`
TTN-YKK commented 2 years ago

Hi, thank you for your attention. As described in the paper, we recommend using IDFD because the stability of IDFD is better than the stability of IDFO. So we are not providing the code of $L{FO}$. However, the implementation of $L{FO}$ is simple and straightforward. Here is an example.

L_fo = torch.pow(torch.mm(ff.t(), ff) - torch.eye(ff.size(1), 2).mean()
Li-Hyn commented 2 years ago

Hi, thank you for your attention. As described in the paper, we recommend using IDFD because the stability of IDFD is better than the stability of IDFO. So we are not providing the code of LFO. However, the implementation of LFO is simple and straightforward. Here is an example.

L_fo = torch.pow(torch.mm(ff.t(), ff) - torch.eye(ff.size(1), 2).mean()

Thank you for your reply! Very helpful! I am currently replicating the results on the other datasets mentioned in your paper, but it seems that they do not achieve the values of the paper, could you provide the data enhancement scheme for the experiments on stl as well as on the imagenet subsets and the relevant parameters for reference? Besides, are there any details that need attention? Thanks a lot.

TTN-YKK commented 2 years ago

As described in the paper, for IDFD we used the same parameters, with the exception of crop size, for all datasets. For IDFO, $alpha$ and crop size are changed according to the data. Other parameters are same. Please read the paper in detail.