Hi, I appreciate your work on Cascade DETR. However, I am a bit confused by the description of the predicted bounding boxes in your statement: "Si is the set of 2D locations inside the predicted bounding box Bi from the preceding decoder layer i." Do you mean the predicted box for all queries, or those boxes matched to groundtruth? Since the predicted bounding box Bi of each decoder layer should be of size (Number_of_queries, 4), which is not yet matched by the Hungarian Matcher. As far as I know, the matcher is only used once before calculating the criterion. So, does that mean to achieve the goal of cascade attention, we need to apply matching to the box result of each decoder layer? Looking forward to your reply.
Hi, I appreciate your work on Cascade DETR. However, I am a bit confused by the description of the predicted bounding boxes in your statement: "Si is the set of 2D locations inside the predicted bounding box Bi from the preceding decoder layer i." Do you mean the predicted box for all queries, or those boxes matched to groundtruth? Since the predicted bounding box Bi of each decoder layer should be of size (Number_of_queries, 4), which is not yet matched by the Hungarian Matcher. As far as I know, the matcher is only used once before calculating the criterion. So, does that mean to achieve the goal of cascade attention, we need to apply matching to the box result of each decoder layer? Looking forward to your reply.