Closed yhcao6 closed 1 year ago
query_sine_embed = query_sine_embed[..., : self.embed_dim] * position_transform
The query_sine_embed
here is (..., 256) shape.
Therefore, we only norm PE(y) with h'/h and PE(x) with w'/w.
I'm closing this issue~ Feel free to reopen it if you need any help
Dear author, thanks for your great work! I have several doubts about the implementation of DAB DETR, specifically, the code about dab decoder as follows:
https://github.com/IDEA-Research/detrex/blob/a11157fd5fec532e1ea3784c9d5ac6d5f0a148fc/projects/dab_detr/modeling/dab_transformer.py#L199-L206
My understanding of codes:
query_sine_embed = PE(A_q), shape=(300, bs, 512), represent (300, bs, (PE(x), PE(y), PE(w), PE(h)))
self.embed_dim = 256
obj_center: shape=(300, bs, 4), represents coordinates of anchors (300, bs, (x, y, h, w))
ref_hw_cond: shape=(300, bs, 2), represents estimate h' and w of' reference point (300, bs, (h', w'))
Question:
So why normalize all of
PE(y), PE(h), PE(w)
withw' / w
?