fundamentalvision / Deformable-DETR

Deformable DETR: Deformable Transformers for End-to-End Object Detection.
Apache License 2.0
3.14k stars 513 forks source link

model implementation question #206

Open shamanneo opened 1 year ago

shamanneo commented 1 year ago

Hello, thank you for this excellent project.

While examining the code, I noticed that the size of the object query is [300, 256], and after passing through 6 multi-head attentions, it becomes [6, 1, 300, 256]. However, I wonder why we are taking only the last one among the others? "6" is not related to the concept of multi-head attention?

out = {'pred_logits': outputs_class[-1], 'pred_boxes': outputs_coord[-1]}

Thank you.