HarborYuan / ovsam

[ECCV 2024] The official code of paper "Open-Vocabulary SAM".
https://www.mmlab-ntu.com/project/ovsam
Other
945 stars 28 forks source link

How $Q_{label}$ is updated? #42

Closed lchen1019 closed 2 months ago

lchen1019 commented 2 months ago

Hi. As mentioned in your paper, $Q{label}$ is the key to CLIP2SAM. I noticed that $Q{label}$ is a learnable token, am I right? And the paper metioned that: 'The final labels are derived by calculating the distance between the refined label token and the CLIP text embedding, as in Equ. (1)'. It means $Q_{label}$ is aligned with text embeddings, and then get the class label through cosine similarity. However, I found that in your code, the roi embeddings is not include Q, as follows, https://github.com/HarborYuan/ovsam/blob/137d2c2e6daea060668cf50d7c966ed86e9c45ce/seg/models/heads/ovsam_head.py#L219 So where does $Q_{label}$ get the gradient for updating? This confuses me. Looking forward to your reply. Thank you in advance!

HarborYuan commented 2 months ago

Hi @lchen1019

In our paper, the $Q_{label}$ is used to describe the "straightforward approach" (Please refer to section 3.2). The code in this repo corresponds to the FPN approach, which is adopted by the final ovsam.

The cis_embd here has no effect at all.

Hope this can help.

lchen1019 commented 2 months ago

No wonder... thank you a lot!