MAGMA - a GPT-style multimodal model that can understand any combination of images and language. NOTE: The freely available model from this repo is only a demo. For the latest multimodal and multilingual models from Aleph Alpha check out our website https://app.aleph-alpha.com
Hi, thanks for the awesome project.
I noticed that the reported BLEU@4 and CIDEr scores in Table 1 are ~10 and ~50 on the MS COCO dataset(zero-shot, after fine-tuning the scores increase to 31 and 90+), respectively, which fall far behind traditional baselines like AoA and CLIP-ViL(they usually achieve ~40 BLEU-4 and 120+ CIDEr).
I am wondering whether the difference is due to the evaluation setup, did you use the evaluation in coco-caption or calculate the scores yourself?
Hi, thanks for the awesome project. I noticed that the reported BLEU@4 and CIDEr scores in Table 1 are ~10 and ~50 on the MS COCO dataset(zero-shot, after fine-tuning the scores increase to 31 and 90+), respectively, which fall far behind traditional baselines like AoA and CLIP-ViL(they usually achieve ~40 BLEU-4 and 120+ CIDEr). I am wondering whether the difference is due to the evaluation setup, did you use the evaluation in coco-caption or calculate the scores yourself?