Questions about the Iterative Bounding Box Refinement based on the Single-Stage pipline

jialeli1 commented 3 years ago

Hi. As mentioned in A.4 in your paper, the center coordinates (x, y) and dimensions (w, h) of the box both will be refined iteratively, and the initial box is set with b_w=0.1 and b_h = 0.1. https://github.com/fundamentalvision/Deformable-DETR/blob/11169a60c33333af00a4849f1808023eba96a931/models/deformable_detr.py#L172 In the code, under the single-stage setting (two_stage=False), the reference.shape[-1] seems to be 2 consistently, including only the center coordinates, which indicates that only the center coordinates are refined iteratively. The dimensions of the box seem to be independently predicted among decoder levels, which is inconsistent with the paper and really puzzles me.

I will be very grateful if you can give me some hints.

tangjiuqi097 commented 3 years ago

Hi，you can refer to these lines.

diegodibe commented 2 years ago

Hi, I also have problem understanding that part. Specifically, in the case in witch the reference points are computed by a linear projection of the query embedding (two_stage=False) the predicted centers are refined iteratively. However, I don't understand the need to detach the new reference points, this would lead to no gradient to update the embeddings. Am I wrong? I would appreciate any hint

EMU1337X commented 2 years ago

Hi, I also have problem understanding that part. Specifically, in the case in witch the reference points are computed by a linear projection of the query embedding (two_stage=False) the predicted centers are refined iteratively. However, I don't understand the need to detach the new reference points, this would lead to no gradient to update the embeddings. Am I wrong? I would appreciate any hint

did you figure out the reason? im also stuck here

fundamentalvision / Deformable-DETR

Questions about the Iterative Bounding Box Refinement based on the Single-Stage pipline #54