Hi, I love the detrex project, such a great work! I am reading the DINO codes, and I have 2 questions about the denoising design:
When making the denoising query, it will pad some zero to make it a batch. But in the calculation, it seems the padded zero will be calculated in attention, because the attention mask does not consider the zero padding. Since the performance of DINO and DN-DETR are great, so it makes me think, how much would this design affect the network. Have you guys tried to consider this masking?
Hi, I love the detrex project, such a great work! I am reading the DINO codes, and I have 2 questions about the denoising design: