Ziyang412 / UCoFiA

Pytorch Code for "Unified Coarse-to-Fine Alignment for Video-Text Retrieval" (ICCV 2023)
https://arxiv.org/abs/2309.10091
MIT License
62 stars 0 forks source link

logit_scale #7

Closed Arsiuuu closed 5 months ago

Arsiuuu commented 5 months ago

Thanks for the great work! I wonder when train or evaluate the model, why only multiple the logit_scale on video-sentence score and sentence-frame score but ignore the pixel_word_score? Does this mean that model cares more about the first two and ignore the latter?https://github.com/Ziyang412/UCoFiA/blob/517f838483af544304482bc70ee7ff4886d3dfc6/eval_v2t/modules/modeling_ucofia.py#L409

Ziyang412 commented 5 months ago

Hi, thanks for the question! we do not consider configuring the logit_scale specifically for any scores (initialized by the following code), thanks! https://github.com/Ziyang412/UCoFiA/blob/517f838483af544304482bc70ee7ff4886d3dfc6/train/modules/module_clip.py#L380

Arsiuuu commented 5 months ago

Sorry, my question is not expressed clearly. What I want to ask is why logit_scale is not multiplied to pixel_word_score, or do you mean that the logit_scale has not been updated?https://github.com/Ziyang412/UCoFiA/blob/517f838483af544304482bc70ee7ff4886d3dfc6/train/modules/modeling_ucofia.py#L368

Ziyang412 commented 5 months ago

I think the "logit_scale" is adapted from another codebase and causing the confusion, I think according to the below code, it is updated to the initial value each round. btw, love your profile picture

https://github.com/Ziyang412/UCoFiA/blob/517f838483af544304482bc70ee7ff4886d3dfc6/train/modules/modeling_ucofia.py#L339

Arsiuuu commented 5 months ago

haha, COYG! So can I just understand that logit scale does not work in training and testing because it is always 1, and I can remove it.

Ziyang412 commented 5 months ago

Yes, I think so. COYG