fawazsammani / nlxgpt

NLX-GPT: A Model for Natural Language Explanations in Vision and Vision-Language Tasks, CVPR 2022 (Oral)
44 stars 10 forks source link

VQA_X finetuning #3

Closed MachinicGlitch closed 2 years ago

MachinicGlitch commented 2 years ago

Hello!

I am attempting to finetune the VQA_X model and am running into some confusions about the data required.

I currently have a dataset of images and captions prepared and formatted similar to vqaX_test_annot_full.json and vqaX_test_annot_exp.json with one-to-one image/annotation pairs along with information to the file path of the jpeg file for each image.

Do I also need to prepare an additional set of data formatted similar to vqaX_val.json & vqaX_test.json with answers, explanations, and the image_id and name in order to do finetune training on the model, or am I able to do so only with the dataset mentioned above?

Thanks

fawazsammani commented 2 years ago

Hi. First what do you mean finetuning VQA_X? We finetune the image captioning model on VQA_X. If I understand you correctly, you want to finetune the finetuned VQA_X model?

vqaX_test_annot_full.json and vqaX_test_annot_exp.json are just meant for evaluating the explanations with the COCO captioning toolkit, as this is the format it expects. But in the Train, Valand Testdata loader, the file that is loaded is vqaX_train.json, vqaX_val.json and vqaX_test.json. These are the main files that will be loaded during training and validation.

fawazsammani commented 2 years ago

Feel free to open this issue again if you have further questions!