FoundationVision / Groma

[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization
https://groma-mllm.github.io/
Apache License 2.0
483 stars 55 forks source link

Tested some images and felt that the grounding ability was weakened a lot compared to the original DINO? #13

Closed TiantZhang closed 1 month ago

machuofan commented 1 month ago

Could you please provide more details on this issue?