cxliu0 / PET

[ICCV 2023] Point-Query Quadtree for Crowd Counting, Localization, and More
MIT License
56 stars 5 forks source link

Questions about stride K and decoder window size. #5

Closed zaqai closed 9 months ago

zaqai commented 11 months ago

I found K, _dec_winsize and the dimension of features from backbone have some relations in decoder. It's difficult to cope with them. Can you tell me the corresponding _dec_winsize when K=10 in your Supplementary and if it still uses features['8x']? In addition to K=4, 8, 10, 16, can we use othe values which are less than 16? Thanks a lot!

cxliu0 commented 11 months ago

As we use rectangle window attention, dec_win_size is related to the spatial dimensions of point queries.

  1. The choice of dec_win_size: Suppose the spatial dimensions of sparse point queries are [sH, sW], then dec_win_size can be set to [sW/2, sH/4]. In this case, it is suggested that sH is divisible by 4 and sW is divisible by 2. In addition, we still use features['8x'] when K=10.

  2. The value of stride K: Theoretically, stride K can be set to any value. However, different choices of K will result in varied spatial dimensions of point queries, which may increase the difficulty of implementing rectangle window attention.