ChenDelong1999 / polite-flamingo

🦩 Visual Instruction Tuning with Polite Flamingo - training multi-modal LLMs to be both clever and polite! (AAAI-24 Oral)
https://arxiv.org/abs/2307.01003
63 stars 3 forks source link

Dataset-PointingQA #2

Closed IQ250 closed 1 year ago

IQ250 commented 1 year ago

Hello authors. Thanks for your effort in dataset contribution, but I feel confused to align the image in PF-1M to the open-source dataset [PointingQA]. For example,

/pointingga-main/Datasets/LookTwiceQA/images_with_points_train/train_42636.jpg how to find this image in [PointingQA] or [Visual Genome]?
ChenDelong1999 commented 1 year ago

Hi,

Thank you for reaching out and bringing this issue to our attention. We sincerely apologize for any confusion that may have arisen.

For datasets such as PointQA and RefCOCOg, we create visual prompts (e.g., colored arrows or bounding boxes) based on the given annotations, for example:

image

The image index (e.g., 42636) does not correspond to the original VG dataset, and we will upload the rendered images soon.

IQ250 commented 1 year ago

I see. Thanks, and looking forward to your paper being accepted

ChenDelong1999 commented 1 year ago

Hi @IQ250 , the raw images, resized to their shortest dimension of 336 pixels, have been uploaded to the Hugging Face dataset repository. You can access it here. This zip file contains the rendered image data for PointQA (LookTwiceQA and LocalQA), RefCOCOg, and the RET-3 remote sensing captioning datasets (RSITMD, RSICD, UCM).

Below, you'll find a table that maps each dataset folder within the zip file to the corresponding image path in PF-1M:

Dataset Zip Folder Image Path in PF-1M
RefCOCOg resized_images_refcocog.zip /refcocog/images_with_bbox/
PointQA - LookTwiceQA resized_images_looktwice.zip /pointingqa-main/Datasets/LookTwiceQA/images_with_points_train/
PointQA - LocalQA resized_images_localqa.zip /pointingqa-main/Datasets/LocalQA/images_with_points/
RET-3 resized_images_rsitmd_rsicd_ucm.zip /rsitmd_rsicd_ucm/images_no_tif/

I hope this helps! If you've got any more questions, just let me know.