TXH-mercury / VALOR

Codes and Models for VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
https://arxiv.org/abs/2304.08345
MIT License
259 stars 16 forks source link

Different Results on msrvtt-1kA #17

Closed YasmineXXX closed 1 year ago

YasmineXXX commented 1 year ago

Thanks for sharing your work!!!

I tested the test code provided in the README.md on msrvtt-1kA and obtained the following results:

07/18/2023 17:38:05 - INFO - main -   ====-zero-shot evaluation--ret%tva%tv--msrvtt_ret_t_v========

07/18/2023 17:38:05 - INFO - main -   {'video_recall': '39.9/69.2/78.8', 'video_ravg': 62.6, 'video_medianR': 2.0, 'video_meanR': 17.953125}
07/18/2023 17:38:05 - INFO - main -   ====-zero-shot evaluation--ret%tva%tv--msrvtt_ret_t_va========

07/18/2023 17:38:05 - INFO - main -   {'video_recall': '43.0/72.1/82.1, 'video_ravg': 65.7, 'video_medianR': 2.0, 'video_meanR': 15.1953125}

Why is the result much lower than the official announcement?

TXH-mercury commented 1 year ago

image

It matches the yellow line performance, which is under base model + VA setting.