Closed zaqai closed 10 months ago
As we use rectangle window attention, dec_win_size
is related to the spatial dimensions of point queries.
The choice of dec_win_size
:
Suppose the spatial dimensions of sparse point queries are [sH, sW]
, then dec_win_size
can be set to [sW/2, sH/4]
. In this case, it is suggested that sH
is divisible by 4 and sW
is divisible by 2. In addition, we still use features['8x'] when K=10.
The value of stride K: Theoretically, stride K can be set to any value. However, different choices of K will result in varied spatial dimensions of point queries, which may increase the difficulty of implementing rectangle window attention.
I found K, _dec_winsize and the dimension of features from backbone have some relations in decoder. It's difficult to cope with them. Can you tell me the corresponding _dec_winsize when K=10 in your Supplementary and if it still uses features['8x']? In addition to K=4, 8, 10, 16, can we use othe values which are less than 16? Thanks a lot!