Aleph-Alpha / magma

MAGMA - a GPT-style multimodal model that can understand any combination of images and language. NOTE: The freely available model from this repo is only a demo. For the latest multimodal and multilingual models from Aleph Alpha check out our website https://app.aleph-alpha.com
MIT License
469 stars 55 forks source link

Reproducing results from your paper #42

Closed Golovneva closed 1 year ago

Golovneva commented 1 year ago

Hi! Thank you for sharing the code for your model. I'm having troubles to reproduce the results you have published in your paper. Here are the scores I get on Coco dataset using the checkpoint provided in your paper:

{'Bleu_1': 0.22440850959728406, 'Bleu_2': 0.11753228266783161, 'Bleu_3': 0.06043320902662557, 'Bleu_4': 0.0321128847993337, 'METEOR': 0.09099773362803487, 'ROUGE_L': 0.16770810280576667, 'CIDEr': 0.11203192991375235}

As you can see, they are significantly lower. I'm using nlg-eval package as you have mentioned here.

What model does your checkpoint corresponds to in the paper, base or long? How do you initialize it for evaluations? Here is my setup:

model = Magma.from_checkpoint( config_path=os.path.join(model_path, "configs/MAGMA_v1.yml"), checkpoint_path="mp_rank_00_model_states.pt", device="cuda:0", )

As a prompt message I'm using "A picture of " - is that correct? I'm using temperature=0.7, and also setting manual seed in torch to 42.

Is there anything I'm missing or doing wrong here? If everything looks fine, could you please share your evaluation scripts that would reproduce results?

CoEich commented 1 year ago

Hi,

thanks for your interest in our work. Regarding your questions:

Please understand that I cannot share further resources, although using nlg-eval should be rather straight forward. With lower temperature and without whitespace at the end of the prompt I would expect results to improve.

Best,

Constantin

Golovneva commented 1 year ago

Thank you! Changing prompt and temperature helped to improve scores, although still lower than reported in the paper: {'Bleu_1': 0.39627885456680473, 'Bleu_2': 0.2550440831428488, 'Bleu_3': 0.16363634146091482, 'Bleu_4': 0.10697364143823233, 'METEOR': 0.1581395520589721, 'ROUGE_L': 0.3094927895838829, 'CIDEr': 0.3734865136851766}