bytedance / Shot2Story

A new multi-shot video understanding benchmark Shot2Story with comprehensive video summaries and detailed shot-level captions.
https://mingfei.info/shot2story
80 stars 5 forks source link

run demo_video.py failed. #11

Open yangyuya opened 2 months ago

yangyuya commented 2 months ago

Thanks for your job!

I don't load whisper and set all ars empty, then I try to run demo_video.py just using upload_vid as follows:

video = "examples/videos/v_-EIsT868Trw.mp4"
text_input = "What is the woman doing?"
input_split = "0 3.1\n3 11.5\n11.5 24.2\n24.2 45"
chat_state = CONV_VISION_MS.copy()
upload_vid(video, text_input, chat_state, temperature=1.0, input_splits=input_split)

But I failed: 1、I get error:

RuntimeError: probability tensor contains either `inf`, `nan` or element < 0

2、If I set do_sample=False in "self.model.llama_model.generate",the error above can be solved. But the "summary" will be:

examples/videos/v_-EIsT868Trw.mp4 <unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk><unk>

I can't solve this problem, could you help me test my own video without whisper?

youthHan commented 2 months ago

Hi so you want to run the demo without ASR and even without loading whisper model?

youthHan commented 2 months ago

Please change the demo.yaml config to disable ASR, https://github.com/bytedance/Shot2Story/blob/7ddced29ff8d639e52543bf13e801f3ba0b26d46/code/lavis/projects/blip2/eval/demo.yaml#L20 to False and https://github.com/bytedance/Shot2Story/blob/5bc6cf8d351a407230ab82c961ba3834526c2bdd/code/lavis/projects/blip2/eval/demo.yaml#L27 to Describe this video in detail.

yangyuya commented 2 months ago

Hi so you want to run the demo without ASR and even without loading whisper model?

Yes, because the videos in activitynet caption don't have ASR, and if I load whisper model I get oom on a 4090 24G GPU.

I have already change the demo.yaml config to disable ASR, then I still get the same problem above.

youthHan commented 1 month ago

@yangyuya Hi have you solved this issue on 4090 GPU? Would you mind providing more details of you environment, errors? I fail to find a 4090 GPU to test.