PanchengZhao / LAKE-RED

[CVPR 2024] LAKE-RED: Camouflaged Images Generation by Latent Background Knowledge Retrieval-Augmented Diffusion.
Apache License 2.0
37 stars 3 forks source link

Confused about the role of the codebook in BKRM #2

Closed PeiChiChen closed 5 months ago

PeiChiChen commented 5 months ago

Hello, thanks for your excellent work. And congraduate for being accepted by CVPR!

I have a question about the mechanism of the background knowledge retrieval. In this part, the queries and values are extracted from the foreground feature, and the keys are extracted from the codebook. However, the keys and values in standard cross-attention are extracted from the same source, which is different from your method. Is there any reason to design this method?

From my point of view, in your work, the role of the codebook is to give the weight of each foreground feature. The information of the codebook is not directly utilized. And maybe I can think of the codebook as a strong MLP in self-attention? Could you tell more about the use of codebook in BKRM?

If there is something I misunderstand, please let me know! Hope for your reply. Thanks a lot!

PanchengZhao commented 5 months ago

Thank you for reaching out. In fact, we utilized the standard cross-attention method, where queries stem from foreground features, while keys and values are drawn from the code book.

The code book is viewed as a collection of rich visual features. BKRM utilizes the consistency between foreground and background to obtain background features from the code book.

The main source of your confusion stems from an error in Equation 4. I sincerely apologize for this oversight and assure you that it will be corrected in the upcoming revised version.

PeiChiChen commented 5 months ago

OK, thanks for your reply!