ZhexinLiang / CLIP-LIT

[ICCV 2023, Oral] Iterative Prompt Learning for Unsupervised Backlit Image Enhancement
https://zhexinliang.github.io/CLIP_LIT_page/
269 stars 23 forks source link

A question about the get_clip_score_from_feature function #20

Open xiaolong-217 opened 6 months ago

xiaolong-217 commented 6 months ago

The similarity of image_features and text_features in this function is calculated through this place: similarity = (100.0 * (image_features/image_nor) @ (text_features/nor).T).softmax(dim=-1). if Taking it as a loss, isn't it expected that image_features and text_features are as orthogonal as possible? But should the expectation be that image_features and text_features are as similar as possible? I hope to get your answer, thank you very much!