Open wzic opened 2 weeks ago
using addition or multiplication is both a good choice, we just need a method to emphasize the given concepts.
I understand that mask_fill() with -inf is for suppressing the unwanted part in K. But in the addition setting, the weight on the mask is just added to the original attention value. It is not that intuitive that how the added weight value can affect the original attention value to make the Q to attend more to the masked K. Could you elaborate more on this ?
I understand that mask_fill() with -inf is for suppressing the unwanted part in K. But in the addition setting, the weight on the mask is just added to the original attention value. It is not that intuitive that how the added weight value can affect the original attention value to make the Q to attend more to the masked K. Could you elaborate more on this ?
Actually, mask_fill() with -inf is used to suppress the unwanted part in the attention map, which is the sim_ref
in the code. The weighted mask is not for the feature K, but for the attention map. It is used to emphasize the area of the attention map corresponding to the reference concepts.
Hi!![image](https://github.com/aim-uofa/FreeCustom/assets/55567040/bfb41ffd-6487-4816-b267-9024afe94a3e)
In the paper, the weighted mask is mutiplied on the product of Q and K. However, in the code, the refmask.masked_fill() is added to the sim_ref. In this case, weight on the mask is not multiplied to QK, but added. May I know the difference between these two strategies, which seem different ? Or my understanding is wrong ?
In addition, may I know what is the recommend weight value for single concept customization ? Is it still 3 ?