bytedance / Flash-VStream

This is the official implementation of "Flash-VStream: Memory-Based Real-Time Understanding for Long Video Streams"
https://invinciblewyq.github.io/vstream-page/
Apache License 2.0
12 stars 2 forks source link

Unable to reproduce the results reported in your paper #2

Open ShaneeyS opened 2 days ago

ShaneeyS commented 2 days ago

Hi Authors,

Thanks for your great work first! It's an amazing contribution to the video understanding task!

However, when I try to reproduce the results reported in the paper, I get several troubles.

I follow the training script in this repo and pretrain / finetune the model on 8 A100 GPU, and perform evaluation on MSVD dataset. However, the accuracy is very low:

Yes count: 5350
No count: 7802
Accuracy: 0.406782
Average score: 2.606600

Total Score Yes/No distribution:
yes:
0: 0
1: 0
2: 0
3: 2
4: 1432
5: 3916
no:
0: 3401
1: 78
2: 4137
3: 139
4: 36
5: 11

Answer Type Score distribution:
Type, Accuracy, Avg_score
total, 0.406782, 2.606600

acc, score, total
0.406782, 2.606600, 0.406782
~                                                                                                                                                                                                                   
~                              

And when I try to use the provided checkpoint https://huggingface.co/IVGSZ/Flash-VStream-7b to perform evaluation, however, I got the following error:

Traceback (most recent call last):
  File "/sh/Flash-VStream-main/flash_vstream/eval_video/model_msvd_qa_featuresloader.py", line 181, in <module>
    run_inference(args)
  File "/sh/Flash-VStream-main/flash_vstream/eval_video/model_msvd_qa_featuresloader.py", line 150, in run_inference
    output_ids = model.generate(
  File "/sh/anaconda3/envs/python/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/sh/anaconda3/envs/python/lib/python3.10/site-packages/transformers/generation/utils.py", line 1588, in generate
    return self.sample(
  File "/sh/anaconda3/envs/python/lib/python3.10/site-packages/transformers/generation/utils.py", line 2678, in sample
    next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

Could you please help me with the problems? Or if there are somewhere that I made something wrong?

Thanks!

ShaneeyS commented 2 days ago

An update:

I have fixed a bug in the code, and the things seem to be normal. However, the generated metrics on MSVD-QA are still a lot lower than the ones in paper: (around ~10 points on Accuracy)

All evaluation completed!
Yes count: 2855
No count: 1139
Accuracy: 0.714822
Average score: 3.827742

Total Score Yes/No distribution:
yes:
0: 0
1: 0
2: 0
3: 1
4: 705
5: 2149
no:
0: 296
1: 3
2: 805
3: 33
4: 2
5: 0

Answer Type Score distribution:
Type, Accuracy, Avg_score
total, 0.714822, 3.827742

acc, score, total
0.714822, 3.827742, 0.714822

Is there any special modification need to be made to reproduce the results?

Thanks!