OFA-Sys / OFA

Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
Apache License 2.0
2.41k stars 248 forks source link

Why does the finetuned vqa model only generate `yes` or `no` or empty string when i set `--unconstrained-training` #371

Closed yuezhao238 closed 1 year ago

yuezhao238 commented 1 year ago

Thank you for this great work and previous replies! I am a freshman in this field, and here'are my questions when i finetune vqa on my specific data. Thanks for reading my questions! The answer in my data only contains chemical symbols and terminology and all question in my data are "What is the chemical reaction in the picture?", which i think should be considered as a different language. I am try to finetune this model on vqa_gen task, but the finetuned ckpt seems to have no difference with the origin pretrained ckpt. I think the problem could be the dict.txt, encoder.json, vocab.bpe in utils.BPE. How can i generate these file on my specific corpus (data['answer'] and data['question'] mentioned above)? After generate these file, can I simply delete the --freeze-encoder-embedding and --freeze-decoder-embedding in the script? Or What should I do after generater these file. Thanks for your reply!

yuezhao238 commented 1 year ago

Sorry, i misunderstood the file mentioned above. But my question is still. Now i am trying to use --unconstrained-training without trainval_ans2label.pkl. But i found the finetuned model can only generate yes or no when i try code you provide in colab, or even just generate an empty string when i run evaluate_vqa_unconstrained.sh. Do you have any suggestion for me on my problem? Or, how much time does this fine-tuning take on a single 3090 24GB. And is there any difference between finetune the model on pretrained.pt and finetuned_on_vqa.pt? More specifically, I pass data to the model in the order of id, image, question, answer. And i am expecting the model to generate answers which has the similar form of data[answer] (actually a json form). Thank you for any reply!