GLAMOR-USC / CLiMB

The Continual Learning in Multimodality Benchmark
MIT License
61 stars 4 forks source link

Populating drawn_images for VCR dataset #3

Open bhat-prashant opened 1 year ago

bhat-prashant commented 1 year ago

First of all, a great repository for multimodal continual learning. Kudos!

I have a doubt regarding how to populate drawn_images in VCR dataset:

The drawn_images folder for the VCR task can be generated from the original vcr1images, using the scripts available [here](https://github.com/rowanz/merlot/tree/main/downstream/vcr/data).

In the link you have mentioned, there are three scripts namely prepare_data, draw_segms and draw_bbox. I understand these scripts generate segmentation and bbox for the images in vc1images.

Could you please clarify how I populate drawn_images? What is the folder structure, contents etc.

Thanks in advance,

tejas1995 commented 1 year ago

Hello, sorry for the late response.

You should be able to modify this script. The drawn_images folder contains two folders: bbox and bbox_logs. bbox contains two folders, train and val, each of which contain a qa and qar folder containing the drawn images.