Closed mcshih closed 10 months ago
Hi, this experiment is done by taking the newly initialized HQ-token and compute its dot product with the original pre-trained output token in SAM. We will get a new output token from this dot product, and then we do the model tuning only on this new output token to predict the high-quality masks.
Thank you very much for your response, but unfortunately, my question remains unresolved. We understand that a token is a vector representation, and when performing a dot product operation on two vectors, we obtain a scalar, not a new output token. If possible, could you please provide further clarification or share some PyTorch code snapshots to help me better understand? Thank you very much.
Thanks for pointing out the typo mistake. It should be "element-wise product" (or Hadamard product) as here. There is no addition.
Thank you for your response. I have no further questions, and I will proceed to close the issue.
While reading your paper, I encountered a question. The row annotated in Table 2 appears to correspond to the following passage in the text: "computing the scaled dot product [18] between the original SAM’s output token and our HQ-Output token." Performing a dot product between two tokens results in a scalar. I am curious about how this scalar is used to generate the final output mask. Additionally, the citation [18] refers to "Visual Prompt Tuning," and this paper does not seem to mention the "scaled dot product," causing some confusion. Could you please provide specific details on how this experiment was conducted?