How to get `datasets/cc3m/train_image_info_tags.json`

clin1223 / VLDet

[ICLR 2023] PyTorch implementation of VLDet （https://arxiv.org/abs/2211.14843）

Other

177 stars 11 forks source link

How to get `datasets/cc3m/train_image_info_tags.json` #3

Closed xiaofeng94 closed 1 year ago

xiaofeng94 commented 1 year ago

Dear authors,

Thanks for presenting such a great work. I'm very interested in this method and trying to reproduce the results on my own. But I'm confused by the data preparation for Conceptual Caption.

As this doc, it seems there are missing steps from datasets/cc3m/train_image_info.json to datasets/cc3m/train_image_info_tags.json. So python tools/get_tags_for_VLDet_concepts.py does not work.

BTW, would you like to provide some descriptions about train_image_info.json and train_image_info_tags.json. They are a bit confusing. I'm wandering where they are used in the training.

clin1223 commented 1 year ago

Sorry for the delay. Just fix the file name as train_image_info_tags.json. I have updated the repo. train_image_info_tags.json is the annotation file of image-text pairs for CC3M dataset.

xiaofeng94 commented 1 year ago

Thanks for the reply.

JiuqingDong commented 1 year ago

Dear authors,

Thanks for presenting such a great work. I'm very interested in this method and trying to reproduce the results on my own. But I'm confused by the data preparation for Conceptual Caption.

As this doc, it seems there are missing steps from datasets/cc3m/train_image_info.json to datasets/cc3m/train_image_info_tags.json. So python tools/get_tags_for_VLDet_concepts.py does not work.

BTW, would you like to provide some descriptions about train_image_info.json and train_image_info_tags.json. They are a bit confusing. I'm wandering where they are used in the training.

Hi, did you re-implement this code? I got some problems when I run this code. Could you give me some help and suggestions?

xiaofeng94 commented 1 year ago

Hey @JiuqingDong , I failed to run the code for LVIS and CC3M, as well. When I trained the model for LVIS, there is a GPU memory leakage. The memory cost will increase steadily even after 20k iters, which is not common. This may be due to different environment setups but I didn't check it further. Not sure if the authors have the same problem.

JiuqingDong commented 1 year ago

@xiaofeng94 I got the problem when I prepare the dataset. Following OVR-CNN to create the open-vocabulary COCO split, then we can get two convert files like this. coco/ zero-shot/ instances_train2017_seen_2.json instances_val2017_all_2.json I got a lot of Errors when I create these two files. Besides, I could not find the file 'coco_65_concepts.txt'

Did you have these three files?

xiaofeng94 commented 1 year ago

@JiuqingDong I don't have coco_65_concepts, but you don't need it to run the code I guess

JiuqingDong commented 1 year ago

@xiaofeng94 I don't know how you can run this code without data preparation. If I don't have these two files: 'instances_train2017_seen_2.json, instances_val2017_all_2.json', I can not get 'instances_train2017_seen_2_del.json.' I will got the Error: FilenotFound:xxxxx file can not found. By the way, which command do you use to run the code?

xiaofeng94 commented 1 year ago

@JiuqingDong I just checked the code and made up anything that was missing.