Clarification on Fixed Visual Token Counts (192, 128, 64) in Table 1

Gumpest / SparseVLMs

Official implementation of paper "SparseVLM: Visual Token Sparsification for Efficient Vision-Language Model Inference" proposed by Peking University and UC Berkeley.

Apache License 2.0

53 stars 3 forks source link

Hello, Thanks for your wonderful research.

I understand that the number of tokens being pruned depends on the value of lambda multiplied by the rank, while the number of recycled tokens is influenced by the hyper parameter tau .

But the table1 of this paper shows that the number of visual tokens is fixed at 192, 128, and 64.

Could you please clarify whether these token counts were hardcoded to select exactly 192, 128, or 64 visual tokens, or if there was another approach to maintaining a fixed token count for these experiments?

Thank you, Sincerely

Gumpest / SparseVLMs

Clarification on Fixed Visual Token Counts (192, 128, 64) in Table 1 #10