Closed jongwoopark7978 closed 2 weeks ago
Hi, just ignore this error.
Hi,
Thanks for your quick reply. Sorry but let me ask it in more detail.
I used inference branch to run the video video_demo.sh and obtained the error above. Can you explain briefly why I can ignore the warnings? It seems weights are not loaded for visual encoder.
Or do you think I should use video_inference branch to run the video_demo.sh? I also tried it but I get the error that I do not have a flash attn. I pasted the error below when I used inference_video branch.
The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use attn_implementation="flash_attention_2"
instead.
Traceback (most recent call last):
File "/efs/users/jongwp/LLaVA-NeXT/playground/demo/video_demo.py", line 181, in
Hi, because we use delay_load=True mode to initialize our model (https://github.com/LLaVA-VL/LLaVA-NeXT/blob/6944062b9bb2e61c48436f1a65c3ea339095ec91/llavavid/model/llava_arch.py#L41). That means the vision_tower is not initialized when we load the ckpt, which raising this warning
Then the weight will be load later at: https://github.com/LLaVA-VL/LLaVA-NeXT/blob/6944062b9bb2e61c48436f1a65c3ea339095ec91/llavavid/model/builder.py#L162
Thanks for the detailed answers. I really appreciate it.
Sorry but one more question. Based on your answers, I need to run video_demo.sh at the inference branch. When do we use video_inference branch then?
Thanks for the detailed answers. I really appreciate it.
Sorry but one more question. Based on your answers, I need to run video_demo.sh at the inference branch. When do we use video_inference branch then? You can see it as a dev branch, and will be delete soon
Hi,
I used the commands below to run the model and the weights seem downloaded correctly. However, I received the message that the model is missing parameters when I run the video_demo.sh.
bash scripts/video/demo/video_demo.sh lmms-lab/LLaVA-NeXT-Video-7B-DPO vicuna_v1 20 2 True /home/ubuntu/EgoSchema/videos_ex/3f61b913-9920-4fbd-ba0f-93f41c255279_croped.mp4
bash scripts/video/demo/video_demo.sh lmms-lab/LLaVA-NeXT-Video-7B vicuna_v1 20 2 True /home/ubuntu/EgoSchema/videos_ex/3f61b913-9920-4fbd-ba0f-93f41c255279_croped.mp4
====== missing parameters message ====== Some weights of the model checkpoint at lmms-lab/LLaVA-NeXT-Video-7B were not used when initializing LlavaLlamaForCausalLM: ['model.vision_tower.vision_tower.vision_model.embeddings.class_embedding', 'model.vision_tower.vision_tower.vision_model.embeddings.patch_embedding.weight', 'model.vision_tower.vision_tower.vision_model.embeddings.position_embedding.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.0.layer_norm1.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.0.layer_norm1.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.0.layer_norm2.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.0.layer_norm2.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.0.mlp.fc1.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.0.mlp.fc1.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.0.mlp.fc2.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.0.mlp.fc2.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.0.self_attn.k_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.0.self_attn.k_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.0.self_attn.out_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.0.self_attn.out_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.0.self_attn.q_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.0.self_attn.q_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.0.self_attn.v_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.0.self_attn.v_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.1.layer_norm1.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.1.layer_norm1.weight', 'model.vision