FoundationVision / Groma

[ECCV2024] Grounded Multimodal Large Language Model with Localized Visual Tokenization
https://groma-mllm.github.io/
Apache License 2.0
483 stars 55 forks source link

evaluation results significantly different #7

Closed xiaoyazhu closed 2 months ago

xiaoyazhu commented 2 months ago

hello, is the provided model the best one? why are the evaluation results I obtained significantly different from the provided results? The method I used is as follows:

image

machuofan commented 2 months ago

This low performance was caused by a mis-config of hyper parameter nms_thres, which was set to 0.0 but should be 0.6. It is now fixed in the latest commit. Please feel free to have a try.