Questions regarding the code

bluecdm / Long-tailed-recognition

MIT License

21 stars 1 forks source link

Thanks for the awesome work! I have several questions related to the code:

In the GMLLoss, 2 contrastive losses are computed: loss_con and loss_con_T, what are the differences? The former is to train the model backbone while the latter is for Tc training?
When computing loss_con and loss_con_T, the masks used are different. loss_con is based on qk_mask which is a torch.ones matrix, while loss_con_T is based on qk_mask_T which is a non-self mask. Why? https://github.com/bluecdm/Long-tailed-recognition/blob/365f8eb4c5b2ac900739ec6ef605dcea9d52cb13/losses.py#L63-L69
In the computation of classification loss, a very small temperature Ts=1/30 is adopted, yet in most works this value is set to 1. Can you please explain why Ts should be very small and what is the motivation and benefits for such setting?

Thanks a lot!

Thank you for the interest on our work.

Exactly. loss_con is used to train the backbone model, and loss_con_T is used for the temperature parameter (T_c). A detailed explanation is provided in Section 4.9 of our paper (https://openreview.net/pdf?id=KqNX6VOqnJ).
We use a non-self mask to train T_c, excluding the self-augmented sample when estimating T_c, i.e., the variance of the Gaussian kernel. Including the self-augmented sample results in a significantly low T_c, as it is highly correlated with the input.
Since we use a cosine similarity classifier (normed linear classifier), the scale of the output logit is significantly smaller than that of a linear classifier. Therefore, we need to scale up the logits using a small temperature. We followed the hyperparameter setting of previous long-tailed recognition literature (https://github.com/FlamieZhu/Balanced-Contrastive-Learning/blob/main/models/resnext.py).

bluecdm / Long-tailed-recognition