Open Artificial-Inability opened 1 year ago
1/wq can make sure that attention maps have a similar shape as the anchor boxes. For example, a large w can result in a flatten attention map in the x direction under the 1/wq formulation. We provide some visualizations in our paper.
href and wref are designed to keep the same dimension with hq and wq. It helps to the final performance.
- 1/wq can make sure that attention maps have a similar shape as the anchor boxes. For example, a large w can result in a flatten attention map in the x direction under the 1/wq formulation. We provide some visualizations in our paper.
Could you give a more detailed explanation about how this works? My personal understanding of the "H=1, W=3" in Figure 6 of the DAB paper is that "href/hq = 1, wref/wq = 3", in which larger wq will lead to smaller W. If I misunderstood something, what is the definition of H and W in Figure6? Thanks.
- 1/wq can make sure that attention maps have a similar shape as the anchor boxes. For example, a large w can result in a flatten attention map in the x direction under the 1/wq formulation. We provide some visualizations in our paper.
Could you give a more detailed explanation about how this works? My personal understanding of the "H=1, W=3" in Figure 6 of the DAB paper is that "href/hq = 1, wref/wq = 3", in which larger wq will lead to smaller W. If I misunderstood something, what is the definition of H and W in Figure6? Thanks.
The results in Fig 6 are examples. "H=1, W=3" means hq =1, wq = 3. We suppose the href and wref are 1.
- 1/wq can make sure that attention maps have a similar shape as the anchor boxes. For example, a large w can result in a flatten attention map in the x direction under the 1/wq formulation. We provide some visualizations in our paper.
Could you give a more detailed explanation about how this works? My personal understanding of the "H=1, W=3" in Figure 6 of the DAB paper is that "href/hq = 1, wref/wq = 3", in which larger wq will lead to smaller W. If I misunderstood something, what is the definition of H and W in Figure6? Thanks.
The results in Fig 6 are examples. "H=1, W=3" means hq =1, wq = 3. We suppose the href and wref are 1.
I couldn't understand this phenomenon theoretically. If the origin value of attention map at a fix point is calculated by (PE(x)PE(xref)wref/wq + ... When we increase wq to wq'=3*wq, the new value should decrease, which will result in a narrower shape attention map. Could you explain why larger wq leads to wider atten map with the formulation theoretically? Thanks.
I have two questions about hw-modulated attention equation (Eq.(6) in DAB-DETR):