dvlab-research / LLaMA-VID

LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models (ECCV 2024)
Apache License 2.0
693 stars 43 forks source link

Long Video CLI wrong #48

Closed QiSu77 closed 8 months ago

QiSu77 commented 9 months ago

When I use " python llamavid/serve/run_llamavid_movie.py --model-path work_dirs/llama-vid/llama-vid-7b-full-224-long-video --video-file ./featuresTest/test.pkl --load-4bit --question 'summarize video information' " It is wrong that : "You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Freezing pretrained qformer weights... Loading pretrained weights... Loading vlm_att_query weights... Loading vlm_att_ln weights... Text with video Traceback (most recent call last): File "/home/deepspeed/multimodal/LLaMA-VID/llamavid/serve/run_llamavid_movie.py", line 113, in run_inference(args) File "/home/deepspeed/multimodal/LLaMA-VID/llamavid/serve/run_llamavid_movie.py", line 73, in run_inference conv = conv_templates[args.conv_mode].copy() KeyError: None ” Is that something wrong with "parser.add_argument("--conv-mode", type=str, default=None)", what should I do ? thanks!

wcy1122 commented 8 months ago

Hi, we add the default value of --conv-mode as 'vicuna_v1'. Check our latest code.

QiSu77 commented 8 months ago

Thanks a lot !