RenShuhuai-Andy / TimeChat

[CVPR 2024] TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding
https://arxiv.org/abs/2312.02051
BSD 3-Clause "New" or "Revised" License
267 stars 23 forks source link

Bad performance of Charades #14

Closed soyeonhong closed 5 months ago

soyeonhong commented 6 months ago

I reproduced using Charades dataset based on the checkpoint given in the repo, and the result was 27.9 for R@1 (IoU = 0.5) and 12.3 for R@1 (IoU = 0.7). However, according to the results of the paper, R@1 (IoU = 0.5) should be 32.2, and R@1 (IoU = 0.7) should be 13.4. In my results, R@1 (IoU = 0.5) is particularly low. If it is this low, can you tell me what parameters or methods I need to change?

RenShuhuai-Andy commented 5 months ago

Hi, thanks for your interest.

Our released ckpt is different from the version used in the paper. The released ckpt was trained after cleaning the code and fixing a minor bug in QuerYD instructions data (some videos have the same start and end timestamps in the raw annotations file, so we only use one timestamp in the revision). In our evaluation, the performance of the released ckpt on YouCook2 is higher than that in the paper, while the performance on Charades-STS & QVHighlight is lower. We also note that the output generated by LLM is different each time, which may cause fluctuations in the evaluation results.

We have uploaded the ckpt used in our paper, please refer to https://huggingface.co/ShuhuaiRen/TimeChat-7b-paper. With this ckpt, I believe you can reproduce the results in our paper.