Closed PeiChiChen closed 5 months ago
Thank you for reaching out. In fact, we utilized the standard cross-attention method, where queries stem from foreground features, while keys and values are drawn from the code book.
The code book is viewed as a collection of rich visual features. BKRM utilizes the consistency between foreground and background to obtain background features from the code book.
The main source of your confusion stems from an error in Equation 4. I sincerely apologize for this oversight and assure you that it will be corrected in the upcoming revised version.
OK, thanks for your reply!
Hello, thanks for your excellent work. And congraduate for being accepted by CVPR!
I have a question about the mechanism of the background knowledge retrieval. In this part, the queries and values are extracted from the foreground feature, and the keys are extracted from the codebook. However, the keys and values in standard cross-attention are extracted from the same source, which is different from your method. Is there any reason to design this method?
From my point of view, in your work, the role of the codebook is to give the weight of each foreground feature. The information of the codebook is not directly utilized. And maybe I can think of the codebook as a strong MLP in self-attention? Could you tell more about the use of codebook in BKRM?
If there is something I misunderstand, please let me know! Hope for your reply. Thanks a lot!