About the Scale parameter

AtsuMiyai / LoCoOp

[NeurIPS2023] LoCoOp: Few-Shot Out-of-Distribution Detection via Prompt Learning

https://arxiv.org/abs/2306.01293

MIT License

63 stars 1 forks source link

About the Scale parameter #3

Closed JiuqingDong closed 5 months ago

JiuqingDong commented 5 months ago

Dear Author,

I have a question why do you divide output by 100?

Is the '100' a hyperparameter? Why do different scaling values yield different OOD performance?

AtsuMiyai commented 5 months ago

Thanks for your question. The original code of CLIP multiplies the value of the cosine similarity by 100 as a softmax temperature. In the field of OOD detection, it is important to remove 100 and set temparature to 1. This is shown in the MCM paper. So, I follow MCM.

JiuqingDong commented 5 months ago

I understand, but I didn't see a hyperparameter in MCM. So it makes me confused.

AtsuMiyai commented 5 months ago

Yes, for my code, we pass logits_per_image = logit_scale * image_features @ text_features.t() in https://github.com/AtsuMiyai/LoCoOp/blob/master/clip_w_local/model.py#L406. So, we need to divide. For MCM, they directly calculate the logits without logit_scale output = image_features @ text_features.T in https://github.com/deeplearning-wisc/MCM/blob/640657ea67cb961045e0999301a6b8101dad65ba/utils/detection_util.py#L232C17-L232C58 so they don't need to divide.