Aleph-Alpha / magma

MAGMA - a GPT-style multimodal model that can understand any combination of images and language. NOTE: The freely available model from this repo is only a demo. For the latest multimodal and multilingual models from Aleph Alpha check out our website https://app.aleph-alpha.com
MIT License
475 stars 55 forks source link

How to reproduce the n shot VQA, OKVQA and GQA results from Table -1? No eval script is available #45

Open sanyalsunny111 opened 1 year ago

sanyalsunny111 commented 1 year ago

Hey Authors,

Awesome work, could you please provide me with the fewshot/zeroshot eval script to reproduce your VQA/OKVQA/GQA results in Table-1? cc @benbrandt @countably1nfinite @sdtblck @Mayukhdeb.

SamSoup commented 1 year ago

Seconded! It would be very helpful to have the evaluation scripts.

CoEich commented 1 year ago

Hi,

currently the eval scripts are not in a releasable state. I might clean them up and add them at some point but can't promise.

We largely follow the implementation in https://github.com/GT-Vision-Lab/VQA/tree/master.

The rest of the details is in our paper https://arxiv.org/abs/2112.05253.

I hope this helps, if there are questions left feel free to ask :-)

Best,

Constantin