About grouding output - Githubissues

FoundationVision / Groma

[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization

https://groma-mllm.github.io/

Apache License 2.0

544 stars 57 forks source link

Closed nguyenquivinhquang closed 4 months ago

nguyenquivinhquang commented 4 months ago

Thanks for your wonderful work, I want to ask which part of the code corresponds to the grounding output in section 3.2 of the paper.

machuofan commented 4 months ago

Thanks for your interest in our work. The codes of preparing grounded output can be founded in dataset definitions, e.g., flicker.py.