Closed IQ250 closed 1 year ago
Hi,
Thank you for reaching out and bringing this issue to our attention. We sincerely apologize for any confusion that may have arisen.
For datasets such as PointQA and RefCOCOg, we create visual prompts (e.g., colored arrows or bounding boxes) based on the given annotations, for example:
The image index (e.g., 42636) does not correspond to the original VG dataset, and we will upload the rendered images soon.
I see. Thanks, and looking forward to your paper being accepted
Hi @IQ250 , the raw images, resized to their shortest dimension of 336 pixels, have been uploaded to the Hugging Face dataset repository. You can access it here. This zip file contains the rendered image data for PointQA (LookTwiceQA and LocalQA), RefCOCOg, and the RET-3 remote sensing captioning datasets (RSITMD, RSICD, UCM).
Below, you'll find a table that maps each dataset folder within the zip file to the corresponding image path in PF-1M:
Dataset | Zip Folder | Image Path in PF-1M |
---|---|---|
RefCOCOg | resized_images_refcocog.zip |
/refcocog/images_with_bbox/ |
PointQA - LookTwiceQA | resized_images_looktwice.zip |
/pointingqa-main/Datasets/LookTwiceQA/images_with_points_train/ |
PointQA - LocalQA | resized_images_localqa.zip |
/pointingqa-main/Datasets/LocalQA/images_with_points/ |
RET-3 | resized_images_rsitmd_rsicd_ucm.zip |
/rsitmd_rsicd_ucm/images_no_tif/ |
I hope this helps! If you've got any more questions, just let me know.
Hello authors. Thanks for your effort in dataset contribution, but I feel confused to align the image in PF-1M to the open-source dataset [PointingQA]. For example,