EvolvingLMMs-Lab / lmms-eval

Accelerating the development of large multimodal models (LMMs) with lmms-eval
https://lmms-lab.github.io/
Other
1.03k stars 53 forks source link

Add REC tasks for testing model ability to locally ground objects, given a description. This adds REC for all RefCOCO datasets. #52

Closed hunterheiden closed 2 months ago

hunterheiden commented 2 months ago

This PR adds REC evaluations, assuming that bounding boxes will be output as raw square brackets and coordinates. It adds REC eval for all RefCOCO sets (RefCOCO, RefCOCO+, RefCOCOg). Specifically:

hunterheiden commented 2 months ago

For sure, happy to provide these results and details of my setup. I've run this using torch version 2.1.2. + CUDA 12.4. I'm running on an Azure VM (Standard NC96ads A100 v4 (96 vcpus, 880 GiB memory), 4xA100(80GB).

Just as a short reference, these are the results for ACC@IoU=0.5:

Dataset Split liuhaotian/llava-v1.5-7b
RefCOCO val 56.2
RefCOCO test 58.1
RefCOCO testA 64.4
RefCOCO testB 47.5
RefCOCO+ val 50.0
RefCOCO+ testA 59.2
RefCOCO+ testB 39.0
RefCOCOg val 48.8
RefCOCOg test 48.4

Additionally, I'll attach the result summaries (as .txt files) for the different versions of COCO:

Let me know if you want me to directly screenshot / image the results, or if this is sufficient information! I have some more REC tasks I'd like to contribute, as well as some evaluations on screen-based benchmarks (ScreenSpot, RICO tasks, etc.).

I'm also running some these on the v1.5-13b model to see the shift, so I'll circle back on the results there too when I have them.

Luodian commented 2 months ago

Thanks for your PR! We are checking it and will have some discussions about the changes in main pipline (these files inside the apis folder). And if you are ready with the 13b results, you can also put here and we will try to finish this PR asap.

hunterheiden commented 2 months ago

Sounds good. I should hopefully have benchmarked the 13B model in the next day or two.

Regarding the core file changes, if there's another way to achieve similar functionality that's more in-line with how you all manage this package, I'm happy to change approaches. The main issue was that I needed the width and height information of images in order to normalize bounding boxes. I also wanted to avoid re-loading the datasets multiple times, especially for splits that aren't needed.

Luodian commented 2 months ago

@kcz358 @jzhang38 Please help to check it thanks!