Open ShaneeyS opened 2 days ago
An update:
I have fixed a bug in the code, and the things seem to be normal. However, the generated metrics on MSVD-QA are still a lot lower than the ones in paper: (around ~10 points on Accuracy)
All evaluation completed!
Yes count: 2855
No count: 1139
Accuracy: 0.714822
Average score: 3.827742
Total Score Yes/No distribution:
yes:
0: 0
1: 0
2: 0
3: 1
4: 705
5: 2149
no:
0: 296
1: 3
2: 805
3: 33
4: 2
5: 0
Answer Type Score distribution:
Type, Accuracy, Avg_score
total, 0.714822, 3.827742
acc, score, total
0.714822, 3.827742, 0.714822
Is there any special modification need to be made to reproduce the results?
Thanks!
Hi Authors,
Thanks for your great work first! It's an amazing contribution to the video understanding task!
However, when I try to reproduce the results reported in the paper, I get several troubles.
I follow the training script in this repo and pretrain / finetune the model on 8 A100 GPU, and perform evaluation on MSVD dataset. However, the accuracy is very low:
And when I try to use the provided checkpoint https://huggingface.co/IVGSZ/Flash-VStream-7b to perform evaluation, however, I got the following error:
Could you please help me with the problems? Or if there are somewhere that I made something wrong?
Thanks!