IDEA-Research / DINO

[ICLR 2023] Official implementation of the paper "DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection"
Apache License 2.0
2.15k stars 232 forks source link

Attention points #154

Open ajay1234567899 opened 1 year ago

ajay1234567899 commented 1 year ago

Hello sir,

In deformable attention, we compute the offsets and add them with the reference points. so in the program models/dino/ops/modules/ms_deform_attn.py

Can we assume that the variable named sampling_locations stores the values of reference points? And it can be visualized just by multiplying the values with the width and breadth of the actual image.

I'm looking forward to hearing back from you. Thank you

HaoZhang534 commented 1 year ago

sampling_locations=reference_points+offsets. There is also variable named reference_points. You can visualize it after multiplying the values with the width and breadth of the actual image.

ajay1234567899 commented 1 year ago

Hello sir, Thanks for your reply

Is it fair to tell that sampling_locations are the points from which features are extracted and aggrigated during cross attention and this aggrigated features are used for the classification and regression process?

I am visulizing the sampling_locations by multiplying it with width and height of the image, is it the right thing to do?

I'm looking forward to hearing back from you. Thank you

HaoZhang534 commented 1 year ago

fair to tell that sampling_locations ar

Your understanding is right.