Visualization code of Figure 1 in paper.

Atten4Vis / ConditionalDETR

This repository is an official implementation of the ICCV 2021 paper "Conditional DETR for Fast Training Convergence". (https://arxiv.org/abs/2108.06152)

Apache License 2.0

369 stars 50 forks source link

Visualization code of Figure 1 in paper. #5

Closed MaureenZOU closed 3 years ago

MaureenZOU commented 3 years ago

Hi Author,

First thanks for your great work to improve the convergence speed of DETR with such a large margin. When reading the paper, I get a little bit confused on how do you exactly draw the attention map in Figure 1.

Given object query q (1 x d), memory feature m (d x (hw)). I use the following equation to draw the attention maps:

Similarity(q,m) = Softmax(proj(q) \dot proj(m)) [1 x (hw)] where proj is the trained linear layer in cross attention module.

The attention maps I get is quite similar with the one shown in DETR paper:

A random object query: Screen Shot 2021-09-13 at 10 46 41 AM

A random object query on head A: Screen Shot 2021-09-13 at 10 47 18 AM

A random object query on head B: Screen Shot 2021-09-13 at 10 47 25 AM

A random object query on head C: Screen Shot 2021-09-13 at 10 47 30 AM

Could you please give some information on how to generate attention in Figure 1? Thanks!

SISTMrL commented 3 years ago

hello, have you generated the attention map like fig. 1? @MaureenZOU

MaureenZOU commented 3 years ago

The problem was solved by the explanation in Section 3.4, paragraph comparison to detr. instead of measuring the similarity with memory + pos_encoding, the author just measuring the similarity between the position encoding.

GWwangshuo commented 3 years ago

The problem was solved by the explanation in Section 3.4, paragraph comparison to detr. instead of measuring the similarity with memory + pos_encoding, the author just measuring the similarity between the position encoding.

@MaureenZOU Could you please kindly provide the souce code for visualizing the attention map? That will be greatly helpful. Thanks a lot.

DeppMeng commented 3 years ago

Hi, @GWwangshuo @MaureenZOU @SISTMrL,

Thank you for your attention. Sorry for the late reply. We did not release the visualization code yet since we find that it is not easy to write a neat and clean version of it. When we finished re-writing this part of code, we will make a release (there is no certain schedule yet, the authors are busy working on recent ddls).

Here is a brief guide:

Perform validation process, record: content, position attention weights, predictions.
Filter out predictions with low classification score, as well as too-small objects.
Plot the original image.
Plot the content/position attention map on top of it.
Plot the prediction box on top of it.
Arrange plots in order you would like (e.g., order of attention heads).

wulele2 commented 2 years ago

The problem was solved by the explanation in Section 3.4, paragraph comparison to detr. instead of measuring the similarity with memory + pos_encoding, the author just measuring the similarity between the position encoding.

Hello, when I tried to visualize detr, I first read the self-attn of the last layer of decoder to get cq:[100,1,256]; In addition, pQ is read from the trained model: [100,256]; Then get the pk of the feature map: [1,256,h, W]; Then calculate ((cq + pq)T * pk).softmax(-1).view(h,w) found out the effect is inconsistent. I really hope to get yours reply.

Flyooofly commented 1 year ago

Hi Author,

First thanks for your great work to improve the convergence speed of DETR with such a large margin. When reading the paper, I get a little bit confused on how do you exactly draw the attention map in Figure 1.

Given object query q (1 x d), memory feature m (d x (hw)). I use the following equation to draw the attention maps:

Similarity(q,m) = Softmax(proj(q) \dot proj(m)) [1 x (hw)] where proj is the trained linear layer in cross attention module.

The attention maps I get is quite similar with the one shown in DETR paper:

A random object query:

A random object query on head A:

A random object query on head B:

A random object query on head C:

Could you please give some information on how to generate attention in Figure 1? Thanks!

你好，请问有研究过如何可视化Deformable-detr的注意力权重吗，我基于DETR提供的绘制代码一直不能得到正确结果~