Closed Golovneva closed 1 year ago
Hi,
thanks for your interest in our work. Regarding your questions:
Please understand that I cannot share further resources, although using nlg-eval should be rather straight forward. With lower temperature and without whitespace at the end of the prompt I would expect results to improve.
Best,
Constantin
Thank you! Changing prompt and temperature helped to improve scores, although still lower than reported in the paper: {'Bleu_1': 0.39627885456680473, 'Bleu_2': 0.2550440831428488, 'Bleu_3': 0.16363634146091482, 'Bleu_4': 0.10697364143823233, 'METEOR': 0.1581395520589721, 'ROUGE_L': 0.3094927895838829, 'CIDEr': 0.3734865136851766}
Hi! Thank you for sharing the code for your model. I'm having troubles to reproduce the results you have published in your paper. Here are the scores I get on Coco dataset using the checkpoint provided in your paper:
{'Bleu_1': 0.22440850959728406, 'Bleu_2': 0.11753228266783161, 'Bleu_3': 0.06043320902662557, 'Bleu_4': 0.0321128847993337, 'METEOR': 0.09099773362803487, 'ROUGE_L': 0.16770810280576667, 'CIDEr': 0.11203192991375235}
As you can see, they are significantly lower. I'm using nlg-eval package as you have mentioned here.
What model does your checkpoint corresponds to in the paper,
base
orlong
? How do you initialize it for evaluations? Here is my setup:model = Magma.from_checkpoint( config_path=os.path.join(model_path, "configs/MAGMA_v1.yml"), checkpoint_path="mp_rank_00_model_states.pt", device="cuda:0", )
As a prompt message I'm using "A picture of " - is that correct? I'm using temperature=0.7, and also setting manual seed in torch to 42.
Is there anything I'm missing or doing wrong here? If everything looks fine, could you please share your evaluation scripts that would reproduce results?