fundamentalvision / Deformable-DETR

Deformable DETR: Deformable Transformers for End-to-End Object Detection.
Apache License 2.0
3.27k stars 529 forks source link

Visualization of deformable attention reference points #109

Open vadimkantorov opened 3 years ago

vadimkantorov commented 3 years ago

Could you please publish the code for visualizations like in Figure 6 in the paper, if you still have this snippet?

Thank you!

LISIJIA0629 commented 3 years ago

have you gotten results?

vadimkantorov commented 3 years ago

I've found an obscure repo with an example of this https://github.com/duongnv0499/Explain-Deformable-DETR/, but haven't tried it yet

GivanTsai commented 2 years ago

Have you figure out how to draw this attention map correctly? Thanks @vadimkantorov

Flyooofly commented 2 years ago

I've found an obscure repo with an example of this https://github.com/duongnv0499/Explain-Deformable-DETR/, but haven't tried it yet Have you figure out how to draw this attention map correctly? Thanks @vadimkantorov

feiji52633 commented 2 years ago

Have you figure out how to draw this attention map correctly? Thanks

tdchua commented 2 years ago

Waiting for an update! My approach was to create a plot of 0s according to feature map size, and then find the sampling locations + attention weights. And then per sampling location, add the corresponding attention weight into the plot of 0s so the more heavy it is the higher the value in the plot. (This is just my idea)

hg6185 commented 1 year ago

Hey, did you manage it in the end? I am trying to do it for a DINO which uses the attention mechanism swell :)

2015wan commented 1 year ago

Have you figure out how to draw this attention map correctly? Thanks @vadimkantorov @GivanTsai

gaowenjie-star commented 8 months ago

Hey, did you manage it in the end? I am trying to do it for a DINO which uses the attention mechanism swell :)

Can you do it? I'm also stuck on the Attention visualisation!

hg6185 commented 8 months ago

I did it like this, but it's kinda hacky: If you want to visualise it you'll need the reference points, where the model is attending to.

Simple hooks in the PyTorch model were not sufficient, as they only extracted the layer weights.

Therefore I modified the layer attention and pushed the locations into a global list (make sure to detach them from your gpu). This list, I then visualised. Depending on the model you'd also have to export the indizes of the top k points that were mapped to the actual detections. Otherwise, the attention is visualised for the wrong objects.

I hope this helps, feel free to ask tho.

gaowenjie-star commented 8 months ago

I did it like this, but it's kinda hacky: If you want to visualise it you'll need the reference points, where the model is attending to.

Simple hooks in the PyTorch model were not sufficient, as they only extracted the layer weights.

Therefore I modified the layer attention and pushed the locations into a global list (make sure to detach them from your gpu). This list, I then visualised. Depending on the model you'd also have to export the indizes of the top k points that were mapped to the actual detections. Otherwise, the attention is visualised for the wrong objects.

I hope this helps, feel free to ask tho. Thank you for your reply. I also have some questions for you, I am using the dino detection model in the mmdetection framework, and I would like to use this to visualise it, I can now get the reference_points and sampling_locations, but it is a bit difficult to visualise this, what do you mean by the locations and indizes in your reply, and if possible, can you provide the code for your visualisation, thank you very much. What do you mean by locations and indizes in your reply, and if possible, could you provide me with the code for your visualisation, thank you very much!

hg6185 commented 8 months ago

If you look at dino architecture, you will find a top k query selection as the model output (at least in detrex) was ordered by relevance. This reordering is after the decoder. So your extracted points will be in a different order.

(https://arxiv.org/abs/2203.03605) Based on your confidence threshold you are probably going to output less than 20 or so.

Further, try to plot it separately first and then try to match colours with the respective objects. I can't provide you code, since I don't have repo access anymore and also I used detectron2.