Thank you for your excellent research! I have a question about Fig. 2, specifically the memory usage for local descriptors. During Global Retrieval, the model outputs 1200 local feature tokens but selects 500 tokens with the highest attention values. However, in Table 3, the memory usage calculation seems to consider only 500 tokens, not the peak of 1200. Could you clarify if I'm misunderstanding something? Thank you!
During global retrieval, only the class token is saved/used. During the reranking, only the 500 tokens are saved in the memory, the others are not saved or used.
Thank you for your excellent research! I have a question about Fig. 2, specifically the memory usage for local descriptors. During Global Retrieval, the model outputs 1200 local feature tokens but selects 500 tokens with the highest attention values. However, in Table 3, the memory usage calculation seems to consider only 500 tokens, not the peak of 1200. Could you clarify if I'm misunderstanding something? Thank you!