model implementation question

Hello, thank you for this excellent project.

While examining the code, I noticed that the size of the object query is [300, 256], and after passing through 6 multi-head attentions, it becomes [6, 1, 300, 256]. However, I wonder why we are taking only the last one among the others? "6" is not related to the concept of multi-head attention?

out = {'pred_logits': outputs_class[-1], 'pred_boxes': outputs_coord[-1]}

Thank you.

fundamentalvision / Deformable-DETR

model implementation question #206