Yuqifan1117 / CaCao

This is the official repository for the paper "Visually-Prompted Language Model for Fine-Grained Scene Graph Generation in an Open World" (Accepted by ICCV 2023)
40 stars 5 forks source link

Dataset used for predicate extraction #10

Closed Yassin-fan closed 10 months ago

Yassin-fan commented 10 months ago

Hello, when I read triplet_extraction.py, I found that for the triplet extraction dataset processing, only coco/image_caption.json was processed and the data was saved in "total_image_region_triplets.json", the cc3m dataset mentioned in the paper has not been used.

In addition, the vg/region_descriptions.json data is additionally processed and saved in "image_caption_triplet.json".

I checked the two json files in the folder "dataset" and their images also correspond to the coco dataset.

My questions are:

  1. Is the cc3m data set abandoned? Switch to using vg dataset for supplementation? If so, when augmenting the vg dataset,the model can be considered to have been trained on this data in advance?
  2. Is the coco/image_caption.json file directly merged from the training and evaluation data sets of the 2014 version?

Tnanks for your help!

Yuqifan1117 commented 10 months ago

Thanks!

  1. We collect data directly from the Internet. Owing to COCO's labeling being more clear and more accurate, we finally use the COCO dataset. The region_descriptions.json is just used to explore the quality of the dataset, while we don't use it for supplement training (not contained in image_caption_triplet_all.json that is used for CaCao training).
  2. Yes, we merge the training data sets of the 2014 version into image_caption.json.