Closed kuixu closed 2 years ago
Thanks for your attention. About the order of pos_y and pos_x, we just follow the default order provided in DETR positional encoding, please refer to https://github.com/Atten4Vis/ConditionalDETR/blob/0b04a859c7fac33a866fcdea06f338610ba6e9d8/models/position_encoding.py#L55
Great work on improving the training convergence of DETR! One minor question, what's the main purpose of concat (pos_y, pos_x) instead of (pos_x, pos_y) in the Decoder? https://github.com/Atten4Vis/ConditionalDETR/blob/0b04a859c7fac33a866fcdea06f338610ba6e9d8/models/transformer.py#L45