RitaRamo / smallcap

SmallCap: Lightweight Image Captioning Prompted with Retrieval Augmentation
94 stars 20 forks source link

Will you provide the evaluation on the out-of-domain datasets and the enriched datastore? #6

Closed dongwhfdyer closed 1 year ago

dongwhfdyer commented 1 year ago

Hey! We are doing research following your work. We have reproduced your work over coco dataset which performs very well, even better than your provided data! But when it comes to other datasets you referred in your paper, like Vizwiz, MSR-VTT, we find it very complicated to get the corresponding metrics results. So I wonder if you could provide the code for these experiments? image

In addition, is the datastore/coco_index_captions.json the augmented datastore you mentioned in 5.2 Augmenting the datastore?

YovaKem commented 1 year ago

Hi @dongwhfdyer, glad to hear you've been able to reproduce the COCO-index results. @RitaRamo and I will follow up on your main request shortly.

For now I can clarify that datastore/coco_index_captions.json is just a file containing the captions (the actual text) associated with datastore/coco_index (which only contains vectors). You can see here that both of these files are created with reference to just the COCO dataset.

dongwhfdyer commented 1 year ago

That's great! And will you release the augmented datastore in the future?

YovaKem commented 1 year ago

@dongwhfdyer the easiest way to reproduce the results from the table above would be for us to share the retrieved neighbors with you (the equivalent to data/retrieved_caps_resnet50x64.json for each datastore and dataset combination in the table). If you need the actual index for each of the datastore configurations, we can also provide that, although some of the files will be quite large, so we have to see where we can host them for sharing.

If the trouble you are facing is just with the actual evaluation script (i.e. with adapting the coco-captions metric computation to the other datasets), let us know and we'll share the relevant files with you.

dongwhfdyer commented 1 year ago

Thank you! I think I need both. 😄

YovaKem commented 1 year ago

Just to make sure, of the three things I mentioned: nearest neighbors, indexes, and evaluation files, which ones do you need?

dongwhfdyer commented 1 year ago

I need nearst neighbors, indexes, and evaluation files.

RitaRamo commented 1 year ago

Sure, we'll try to provide as soon as possible, probably tomorrow :)

kondounagi commented 1 year ago

@RitaRamo Hello,

Thank you for your fantastic work! I would appreciate it if you could share nearest neighbors, indexes, and evaluation files with me.

RitaRamo commented 1 year ago

Hello,

I was busy last week, thanks for waiting! The nearest neighbors and evaluation files are here.

You can find the datastores on HF.

dongwhfdyer commented 1 year ago

Thank you a lot!! We are trying to reproduce your work on other datasets now. ☺ But when it comes to flick30k, I feel rather confused about its nns file (setup_in_domain.json, setup_in_domain_web.json, etc) and its index file(captions_test2014_new.json ,captions_val2014_new.json ). Their indexes don't match with each other. For example, for image files listed in captions_test2014_new.json, I can't find their corresponding retrieved captions, which I can easily do that on coco datasets with your provided dataset_coco.json and retrieved_caps_resnet50x64.json. Or, can you explain what these indexes that I highlighted refers to in setup_in_domain.json? image

RitaRamo commented 1 year ago

Hi,

Those ids correspond to the ids of the images. As in your example, id 67 can be found in captions_val2014_new.json:

{"id": 67, "width": 0, "height": 0, "file_name": "1018148011.jpg"... } This means the id 67 corresponds to the Flickr30k image "1018148011.jpg".

All the validation images and test images are in the captions_val2014_new.json. So please ignore the captions_test2014_new.

taewhankim commented 8 months ago

Hi~ Thanks for sharing data! Could you share the msrvtt frame mentioned in captions_val2014_new.json? or pseudo code?