Closed lchen1019 closed 2 months ago
Hi @lchen1019
In our paper, the $Q_{label}$ is used to describe the "straightforward approach" (Please refer to section 3.2). The code in this repo corresponds to the FPN approach, which is adopted by the final ovsam.
The cis_embd here has no effect at all.
Hope this can help.
No wonder... thank you a lot!
Hi. As mentioned in your paper, $Q{label}$ is the key to CLIP2SAM. I noticed that $Q{label}$ is a learnable token, am I right? And the paper metioned that: 'The final labels are derived by calculating the distance between the refined label token and the CLIP text embedding, as in Equ. (1)'. It means $Q_{label}$ is aligned with text embeddings, and then get the class label through cosine similarity. However, I found that in your code, the roi embeddings is not include Q, as follows, https://github.com/HarborYuan/ovsam/blob/137d2c2e6daea060668cf50d7c966ed86e9c45ce/seg/models/heads/ovsam_head.py#L219 So where does $Q_{label}$ get the gradient for updating? This confuses me. Looking forward to your reply. Thank you in advance!