YifanXu74 / MQ-Det

Official PyTorch implementation of "Multi-modal Queried Object Detection in the Wild" (accepted by NeurIPS 2023)
Apache License 2.0
256 stars 12 forks source link

why two cross-attention? #29

Open PAradoxLG opened 10 months ago

PAradoxLG commented 10 months ago

image In the paper,there are two cross-Attn in the right of Fig(1). What can GCP benefit from this structure?Looking forward to your reply!

YifanXu74 commented 10 months ago

Hi,

The two cross-attention layers are shared by all GCP modules. The two cross-attention layers serve to pre-filter the negative vision queries using the target image features. For example, if a vision query does not exist in a target image, the two cross-attention layers will curb its activation, thus making GCPs more focus on positive vision queries.