Atten4Vis / ConditionalDETR

This repository is an official implementation of the ICCV 2021 paper "Conditional DETR for Fast Training Convergence". (https://arxiv.org/abs/2108.06152)
Apache License 2.0
358 stars 48 forks source link

Visualization code of Figure 1 in paper. #5

Closed MaureenZOU closed 2 years ago

MaureenZOU commented 2 years ago

Hi Author,

First thanks for your great work to improve the convergence speed of DETR with such a large margin. When reading the paper, I get a little bit confused on how do you exactly draw the attention map in Figure 1.

Given object query q (1 x d), memory feature m (d x (hw)). I use the following equation to draw the attention maps:

Similarity(q,m) = Softmax(proj(q) \dot proj(m)) [1 x (hw)] where proj is the trained linear layer in cross attention module.

The attention maps I get is quite similar with the one shown in DETR paper:

A random object query: Screen Shot 2021-09-13 at 10 46 41 AM

A random object query on head A: Screen Shot 2021-09-13 at 10 47 18 AM

A random object query on head B: Screen Shot 2021-09-13 at 10 47 25 AM

A random object query on head C: Screen Shot 2021-09-13 at 10 47 30 AM

Could you please give some information on how to generate attention in Figure 1? Thanks!

SISTMrL commented 2 years ago

hello, have you generated the attention map like fig. 1? @MaureenZOU

MaureenZOU commented 2 years ago

The problem was solved by the explanation in Section 3.4, paragraph comparison to detr. instead of measuring the similarity with memory + pos_encoding, the author just measuring the similarity between the position encoding.

GWwangshuo commented 2 years ago

The problem was solved by the explanation in Section 3.4, paragraph comparison to detr. instead of measuring the similarity with memory + pos_encoding, the author just measuring the similarity between the position encoding.

@MaureenZOU Could you please kindly provide the souce code for visualizing the attention map? That will be greatly helpful. Thanks a lot.

DeppMeng commented 2 years ago

Hi, @GWwangshuo @MaureenZOU @SISTMrL,

Thank you for your attention. Sorry for the late reply. We did not release the visualization code yet since we find that it is not easy to write a neat and clean version of it. When we finished re-writing this part of code, we will make a release (there is no certain schedule yet, the authors are busy working on recent ddls).

Here is a brief guide:

  1. Perform validation process, record: content, position attention weights, predictions.
  2. Filter out predictions with low classification score, as well as too-small objects.
  3. Plot the original image.
  4. Plot the content/position attention map on top of it.
  5. Plot the prediction box on top of it.
  6. Arrange plots in order you would like (e.g., order of attention heads).
wulele2 commented 2 years ago

The problem was solved by the explanation in Section 3.4, paragraph comparison to detr. instead of measuring the similarity with memory + pos_encoding, the author just measuring the similarity between the position encoding.

Hello, when I tried to visualize detr, I first read the self-attn of the last layer of decoder to get cq:[100,1,256]; In addition, pQ is read from the trained model: [100,256]; Then get the pk of the feature map: [1,256,h, W]; Then calculate ((cq + pq)T * pk).softmax(-1).view(h,w) found out the effect is inconsistent. I really hope to get yours reply.

Flyooofly commented 1 year ago

Hi Author,

First thanks for your great work to improve the convergence speed of DETR with such a large margin. When reading the paper, I get a little bit confused on how do you exactly draw the attention map in Figure 1.

Given object query q (1 x d), memory feature m (d x (hw)). I use the following equation to draw the attention maps:

Similarity(q,m) = Softmax(proj(q) \dot proj(m)) [1 x (hw)] where proj is the trained linear layer in cross attention module.

The attention maps I get is quite similar with the one shown in DETR paper:

A random object query: Screen Shot 2021-09-13 at 10 46 41 AM

A random object query on head A: Screen Shot 2021-09-13 at 10 47 18 AM

A random object query on head B: Screen Shot 2021-09-13 at 10 47 25 AM

A random object query on head C: Screen Shot 2021-09-13 at 10 47 30 AM

Could you please give some information on how to generate attention in Figure 1? Thanks!

你好,请问有研究过如何可视化Deformable-detr的注意力权重吗,我基于DETR提供的绘制代码一直不能得到正确结果~