LLaVA-VL / LLaVA-NeXT

Apache License 2.0
2.4k stars 167 forks source link

Where is the module llavavid? #124

Open Leon1207 opened 1 month ago

uahic commented 1 month ago

image

Leon1207 commented 1 month ago

Thanks, but I can't find this dir in the latest version of the code.

uahic commented 1 month ago

Hmh, you're right. Either they have silently just got rid of this module or replaced it. It seems quite some new code has been added, have a look at the video demo code

https://github.com/LLaVA-VL/LLaVA-NeXT/blob/main/playground/demo/video_demo.pyhttps://github.com/LLaVA-VL/LLaVA-NeXT/blob/main/playground/demo/video_demo.py

tsaiJN commented 1 month ago

+1, can we bring it back / update code?

ZhangYuanhan-AI commented 1 month ago

Hi all!

We have refactored our code, and now the llavavid (old) module is smoothly merged into llava. You can use the code in the main branch for both training and inference of the video model.

Leon1207 commented 1 month ago

Thanks for your great work! I would like to ask if I can use the weight 'llava-v1.6-vicuna-7b' directly in the video demo, which means implementing LLaVA-NeXT in the video task without fine-tuning it with video data? Like: bash scripts/video/demo/video_demo.sh /data/llava-next_weights/llava-v1.6-vicuna-7b vicuna_v1 12 1 average after no_token True playground/demo/xU25MMA2N4aVtYay.mp4

ZhangYuanhan-AI commented 1 month ago

Sure you can. However, the command should be

bash scripts/video/demo/video_demo.sh /data/llava-next_weights/llava-v1.6-vicuna-7b vicuna_v1 12 1 average no_token True playground/demo/xU25MMA2N4aVtYay.mp4 In our main branch, the default pooling position is after, so you do not need to specify this.

Leon1207 commented 1 month ago

Sure you can. However, the command should be

bash scripts/video/demo/video_demo.sh /data/llava-next_weights/llava-v1.6-vicuna-7b vicuna_v1 12 1 average no_token True playground/demo/xU25MMA2N4aVtYay.mp4 In our main branch, the default pooling position is after, so you do not need to specify this.

Kindly thank you for your reply! When I directly run your latest code with the command bash scripts/video/demo/video_demo.sh /data/llava-next_weights/llava-v1.6-vicuna-7b vicuna_v1 12 1 average no_token True playground/demo/xU25MMA2N4aVtYay.mp4, it seems to have some error like:

Traceback (most recent call last): File "/home/lyd/LLaVA-NeXT/playground/demo/video_demo.py", line 314, in <module> run_inference(args) File "/home/lyd/LLaVA-NeXT/playground/demo/video_demo.py", line 226, in run_inference if "mistral" not in cfg_pretrained._name_or_path.lower(): UnboundLocalError: local variable 'cfg_pretrained' referenced before assignment

This happens the same when using bash scripts/video/demo/video_demo.sh /data/llava-next_weights/LLaVA-NeXT-Video-7B-DPO vicuna_v1 12 1 average no_token False playground/demo/xU25MMA2N4aVtYay.mp4.

ZhangYuanhan-AI commented 1 month ago

Oh, you should set overwrite=True

Leon1207 commented 1 month ago

Oh, you should set overwrite=True

Oh I figure out why. Thank you for your patience!

ZhangYuanhan-AI commented 1 month ago

Anyway, bug fixed. Please pulling the latest code.

Leon1207 commented 1 month ago

Thanks!

Leon1207 commented 1 month ago

Sorry for bothering you again. When I directly used bash scripts/video/demo/video_demo.sh /data/llava-next_weights/llava-v1.6-vicuna-7b vicuna_v1 12 1 average no_token True playground/demo/xU25MMA2N4aVtYay.mp4 in your latest code, I faced the error ValueError: Model llava-v1.6-vicuna-7b not supported. Or does it have any other way I can use llava-v1.6-vicuna-7b to preform video task?

ZhangYuanhan-AI commented 1 month ago
image

Works from my side

Leon1207 commented 1 month ago

Oh I figured it out, I do not install flash-attn, so set attn_implementation="eager" can solve. Referring to the readme file you provided, setting attn_implementation="None" seems not support?