I have some doubts in understanding the modulated HW attentions part of DAB-DETR. In line 238 of transformer.py, query_sine_embed has only intercepted the embedding of x and y. Shouldn't the 243 and 244 lines of modulat HW attentions be done for the embedding of w and h? Why do you still operate on the embedding of x and y? It's a bit confusing.
I have some doubts in understanding the modulated HW attentions part of DAB-DETR. In line 238 of transformer.py, query_sine_embed has only intercepted the embedding of x and y. Shouldn't the 243 and 244 lines of modulat HW attentions be done for the embedding of w and h? Why do you still operate on the embedding of x and y? It's a bit confusing.