csuhan / OneLLM

[CVPR 2024] OneLLM: One Framework to Align All Modalities with Language
Other
596 stars 32 forks source link

Clotho V2 annotation file #20

Open manushree635 opened 6 months ago

manushree635 commented 6 months ago

Hey authors, Can y'all share the annotation json for clothov2 evaluation.

csuhan commented 6 months ago

Hi @manushree635 , please check this page: https://zenodo.org/records/4783391

manushree635 commented 6 months ago

they have a csv file where each audio has 5 captions, how did you convert it to the required format, that is need by coco eval

manushree635 commented 6 months ago

i'm not able to replicate the clothov2 numbers mentioned in the paper by using all the reference 5 captions in clotho. If you could guide me with this, it would be really helpful.

Also the spider score mentioned for ltu is incorrect, you have reported the spice score, instead of spider. That is an unfair comparison given how much spider and spice scores vary

csuhan commented 6 months ago

The converted annotation file is at: https://huggingface.co/datasets/csuhan/OneLLM_Eval/blob/main/audio/clothov2/eval_clothocap_ann.json

Sorry for the confusion about LTU. We will update the table soon.

manushree635 commented 6 months ago

hey @csuhan , i used the annotation file and the checkpoint mentioned in the readme. These are the scores I'm getting, which is way off from the scores mentioned in the paper. Is there something I'm doing wrong, can you please help me with this

Bleu_1: 0.481 Bleu_2: 0.271 Bleu_3: 0.165 Bleu_4: 0.100 METEOR: 0.139 ROUGE_L: 0.321 CIDEr: 0.237