IDEA-Research / detrex

detrex is a research platform for DETR-based object detection, segmentation, pose estimation and other visual recognition tasks.
https://detrex.readthedocs.io/en/latest/
Apache License 2.0
2.02k stars 211 forks source link

Question about DAB DETR Implementation #125

Closed yhcao6 closed 1 year ago

yhcao6 commented 2 years ago

Dear author, thanks for your great work! I have several doubts about the implementation of DAB DETR, specifically, the code about dab decoder as follows:

https://github.com/IDEA-Research/detrex/blob/a11157fd5fec532e1ea3784c9d5ac6d5f0a148fc/projects/dab_detr/modeling/dab_transformer.py#L199-L206

My understanding of codes: query_sine_embed = PE(A_q), shape=(300, bs, 512), represent (300, bs, (PE(x), PE(y), PE(w), PE(h))) self.embed_dim = 256 obj_center: shape=(300, bs, 4), represents coordinates of anchors (300, bs, (x, y, h, w)) ref_hw_cond: shape=(300, bs, 2), represents estimate h' and w of' reference point (300, bs, (h', w'))

Question:

The code: query_sine_embed[..., self.embed_dim // 2 :] *= (ref_hw_cond[..., 0] / obj_center[..., 2] 

query_sine_embed[..., self.embed_dim // 2 :]: shape=(300, bs, 384), represents (300, bs, (PE(y), PE(h), PE(w)))
ref_hw_cond[..., 0] : shape=(300, bs, 1), represents (300, bs, w')
obj_center[..., 2]: shape=(300, bs, 1), represents (300, bs, w)

So why normalize all of PE(y), PE(h), PE(w) with w' / w?

SlongLiu commented 2 years ago

Please see https://github.com/IDEA-Research/detrex/blob/a11157fd5fec532e1ea3784c9d5ac6d5f0a148fc/projects/dab_detr/modeling/dab_transformer.py#L197:

query_sine_embed = query_sine_embed[..., : self.embed_dim] * position_transform

The query_sine_embed here is (..., 256) shape.

Therefore, we only norm PE(y) with h'/h and PE(x) with w'/w.

rentainhe commented 1 year ago

I'm closing this issue~ Feel free to reopen it if you need any help