Add REC tasks for testing model ability to locally ground objects, given a description. This adds REC for all RefCOCO datasets.

hunterheiden commented 2 months ago

This PR adds REC evaluations, assuming that bounding boxes will be output as raw square brackets and coordinates. It adds REC eval for all RefCOCO sets (RefCOCO, RefCOCO+, RefCOCOg). Specifically:

.yaml files are added, both for the defaults and specifics for splits
util_rec.py for each set (identical across RefCOCO sets)
minor modifications to Task to change how no_image_dataset is created. I've tried to do this more efficiently so that we can explode the dataset into one answer-bounding box pair instead of a set of answers for a single bounding box.

hunterheiden commented 2 months ago

For sure, happy to provide these results and details of my setup. I've run this using torch version 2.1.2. + CUDA 12.4. I'm running on an Azure VM (Standard NC96ads A100 v4 (96 vcpus, 880 GiB memory), 4xA100(80GB).

Just as a short reference, these are the results for ACC@IoU=0.5:

Dataset	Split	liuhaotian/llava-v1.5-7b
RefCOCO	val	56.2
RefCOCO	test	58.1
RefCOCO	testA	64.4
RefCOCO	testB	47.5
RefCOCO+	val	50.0
RefCOCO+	testA	59.2
RefCOCO+	testB	39.0
RefCOCOg	val	48.8
RefCOCOg	test	48.4

Additionally, I'll attach the result summaries (as .txt files) for the different versions of COCO:

Let me know if you want me to directly screenshot / image the results, or if this is sufficient information! I have some more REC tasks I'd like to contribute, as well as some evaluations on screen-based benchmarks (ScreenSpot, RICO tasks, etc.).

I'm also running some these on the v1.5-13b model to see the shift, so I'll circle back on the results there too when I have them.

Luodian commented 2 months ago

Thanks for your PR! We are checking it and will have some discussions about the changes in main pipline (these files inside the apis folder). And if you are ready with the 13b results, you can also put here and we will try to finish this PR asap.

hunterheiden commented 2 months ago

Sounds good. I should hopefully have benchmarked the 13B model in the next day or two.

Regarding the core file changes, if there's another way to achieve similar functionality that's more in-line with how you all manage this package, I'm happy to change approaches. The main issue was that I needed the width and height information of images in order to normalize bounding boxes. I also wanted to avoid re-loading the datasets multiple times, especially for splits that aren't needed.

Luodian commented 2 months ago

@kcz358 @jzhang38 Please help to check it thanks!

EvolvingLMMs-Lab / lmms-eval

Add REC tasks for testing model ability to locally ground objects, given a description. This adds REC for all RefCOCO datasets. #52