LinWeizheDragon / Retrieval-Augmented-Visual-Question-Answering

This is the official repository for Retrieval Augmented Visual Question Answering
GNU General Public License v3.0
184 stars 15 forks source link

可以用自定义数据微调吗 #42

Closed zzk2021 closed 5 months ago

zzk2021 commented 5 months ago

我理解的这个微调是分两步骤吗,第一步微调检索器,第二部微调blip?

zzk2021 commented 5 months ago

然后有关roi的部分,是不是还需要用vinvl提取

LinWeizheDragon commented 5 months ago

Yes, the code shows how to fine-tune the retriever, and then fine-tune the blip while freezing the retriever. The ROI features can be extracted following the instructions in the README. You can also replace it with any other object detection model if you don't need to strictly reproduce the numbers in the paper.

zzk2021 commented 5 months ago

Dear author, I want to know the format of file './output/[train/val/test]_predictions.json' , my work run failed when take the step path/to/azcopy copy 'https://biglmdiag.blob.core.windows.net/vinvl/datasets/coco_caption ./oscar_dataset --recursive, I commit a issue on the website "https://github.com/microsoft/Oscar" but I still don't know how to fix it.

LinWeizheDragon commented 5 months ago

Do you want to get the predictions for OKVQA or custom datasets? In the repo, there is a zip file containing all pre extracted files.

zzk2021 commented 5 months ago

Do you want to get the predictions for OKVQA or custom datasets? In the repo, there is a zip file containing all pre extracted files.

Thanks!I want to know the format so I can manufacturing my custom datasets. I think zip file can solve my problem.