deeplearning-wisc / hypo

13 stars 0 forks source link

Why not discard the projection head at inference time? #3

Closed JungHunOh closed 4 months ago

JungHunOh commented 5 months ago

Hello, thank you for the inspiring work.

While many projects discard the non-linear projection head module after training, Hypo appears to retain it after training. It seems that the code is based on CIDER (https://github.com/deeplearning-wisc/cider), which also discards the projection head at inference time.

Could you elaborate on the reasoning behind this decision? I would appreciate further discussion on this issue.

alvinmingsf commented 5 months ago

Thanks for your interest in our work! For self-supervised contrastive methods, it is a common practice to discard the projection head after training (e.g., https://arxiv.org/abs/2304.12210 Sec 3.2 contains more details and experiments on the design of the projection head). For supervised SSL, as we can simply predict the class label based on the class prototypes (P4), whether to discard the projection head becomes a design choice. In addition, retaining the projection head makes theoretical analysis cleaner as the final prediction shares the same space as the feature embeddings learned with the HYPO loss.