FoundationVision / Groma

[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization
https://groma-mllm.github.io/
Apache License 2.0
544 stars 57 forks source link

why the number of tokens in LLM is dynamic? #18

Closed liuting20 closed 2 months ago

liuting20 commented 3 months ago

we found the number of tokens in LLM is dynamic, what is the 1 token of the shape [batch,1, channel]?

machuofan commented 3 months ago

Sorry, I'm a bit confused with question. Could you please show me the code segment you have questions with?