DavidHuji / CapDec

CapDec: SOTA Zero Shot Image Captioning Using CLIP and GPT2, EMNLP 2022 (findings)
MIT License
183 stars 19 forks source link

how to download / create single_caption_per_sample_val.json file #7

Open BoiAkay opened 1 year ago

BoiAkay commented 1 year ago

can anyone please help me how to generate single_caption_per_sample_val.json file as mentioned in embeddings_generator.py file as shown below annotations_path = f'/home/gamir/DER-Roei/davidn/myprivate_coco/annotations/single_caption_per_sample_val.json'

DavidHuji commented 1 year ago

Hi, here are the instructions. Please let me know if you encounter any issue.

gWeiXP commented 11 months ago

Hi, here are the instructions. Please let me know if you encounter any issue.

Hi, I had the same problem, I didn't get single_caption_per_sample_val.json, and what does it mean to set dataset_mode to 0.5, 1.5, 2.5, etc. in embeddings_generator.py ?

gWeiXP commented 10 months ago

I gave up, I found other code on GitHub and then conducted an evaluation, referring to https://github.com/jmhessel/clipscore. Simply save the generated descriptions and label descriptions into two lists, refer to clipscore.py.

DavidHuji commented 10 months ago

Hi, sorry for the confusion. The json (single_caption_per_sample_val) holds the captions data (per id) and it is generated in the script of parse_karpathy. So once you download the data from the sources I mentioned in the readme, you can use the script of parse_karpathy to pre-process it and to generate a json that is in the format of single_caption_per_sample_val. Then you can simply use that json as the input for the embeddings_generator. The different dataset_mode s in the embeddings_generator are just something internal for me that was useful since I wanted have mode per dataset (for me it is easier to manage the different ~10 paths) but you can definitely ignore it and just have your own json and assign it there to 'annotations_path'. Hope it is helpful. Once I have some free time I'll update the code to make it easier to use.

qq123aa456 commented 10 months ago

Hi, sorry for the confusion. The json (single_caption_per_sample_val) holds the captions data (per id) and it is generated in the script of parse_karpathy. So once you download the data from the sources I mentioned in the readme, you can use the script of parse_karpathy to pre-process it and to generate a json that is in the format of single_caption_per_sample_val. Then you can simply use that json as the input for the embeddings_generator. The different dataset_mode s in the embeddings_generator are just something internal for me that was useful since I wanted have mode per dataset (for me it is easier to manage the different ~10 paths) but you can definitely ignore it and just have your own json and assign it there to 'annotations_path'. Hope it is helpful. Once I have some free time I'll update the code to make it easier to use. Thank you so much for your reply,could you please give us some instructions on how to get scores,like bleu,cider?

qq123aa456 commented 10 months ago

@wxpqq826615304 I'll try this,thanks