Open manushree635 opened 6 months ago
Hi @manushree635 , please check this page: https://zenodo.org/records/4783391
they have a csv file where each audio has 5 captions, how did you convert it to the required format, that is need by coco eval
i'm not able to replicate the clothov2 numbers mentioned in the paper by using all the reference 5 captions in clotho. If you could guide me with this, it would be really helpful.
Also the spider score mentioned for ltu is incorrect, you have reported the spice score, instead of spider. That is an unfair comparison given how much spider and spice scores vary
The converted annotation file is at: https://huggingface.co/datasets/csuhan/OneLLM_Eval/blob/main/audio/clothov2/eval_clothocap_ann.json
Sorry for the confusion about LTU. We will update the table soon.
hey @csuhan , i used the annotation file and the checkpoint mentioned in the readme. These are the scores I'm getting, which is way off from the scores mentioned in the paper. Is there something I'm doing wrong, can you please help me with this
Bleu_1: 0.481 Bleu_2: 0.271 Bleu_3: 0.165 Bleu_4: 0.100 METEOR: 0.139 ROUGE_L: 0.321 CIDEr: 0.237
Hey authors, Can y'all share the annotation json for clothov2 evaluation.