Closed Haerxu closed 9 months ago
I think a simplified alternative could be "soft mask", used to direct the attention of the CLIP model. However, it is a little beyond the soft mask as we found it could somewhat improve the original clip model's recognition ability on stuff classes.
Hi,
How to understand the attention bias generated from the adapter? I cannot get its purpose.