[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization
483
stars
55
forks
source link
Tested some images and felt that the grounding ability was weakened a lot compared to the original DINO? #13
Closed
TiantZhang closed 1 month ago
Could you please provide more details on this issue?