aim-uofa / FreeCustom

[CVPR 2024] Official PyTorch implementation of FreeCustom: Tuning-Free Customized Image Generation for Multi-Concept Composition
https://aim-uofa.github.io/FreeCustom/
MIT License
62 stars 0 forks source link

Confusion about the implementation of the weighted mask #4

Open wzic opened 2 weeks ago

wzic commented 2 weeks ago

Hi! image

image

In the paper, the weighted mask is mutiplied on the product of Q and K. However, in the code, the refmask.masked_fill() is added to the sim_ref. In this case, weight on the mask is not multiplied to QK, but added. May I know the difference between these two strategies, which seem different ? Or my understanding is wrong ?

In addition, may I know what is the recommend weight value for single concept customization ? Is it still 3 ?

dingangui commented 1 week ago

using addition or multiplication is both a good choice, we just need a method to emphasize the given concepts.

wzic commented 1 week ago

I understand that mask_fill() with -inf is for suppressing the unwanted part in K. But in the addition setting, the weight on the mask is just added to the original attention value. It is not that intuitive that how the added weight value can affect the original attention value to make the Q to attend more to the masked K. Could you elaborate more on this ?

dingangui commented 1 week ago

I understand that mask_fill() with -inf is for suppressing the unwanted part in K. But in the addition setting, the weight on the mask is just added to the original attention value. It is not that intuitive that how the added weight value can affect the original attention value to make the Q to attend more to the masked K. Could you elaborate more on this ?

Actually, mask_fill() with -inf is used to suppress the unwanted part in the attention map, which is the sim_ref in the code. The weighted mask is not for the feature K, but for the attention map. It is used to emphasize the area of the attention map corresponding to the reference concepts.