LLaVA-VL / LLaVA-NeXT

1.01k stars 55 forks source link

Missing Parameters While Loading the Checkpoints #60

Closed jongwoopark7978 closed 2 weeks ago

jongwoopark7978 commented 2 weeks ago

Hi,

I used the commands below to run the model and the weights seem downloaded correctly. However, I received the message that the model is missing parameters when I run the video_demo.sh.

bash scripts/video/demo/video_demo.sh lmms-lab/LLaVA-NeXT-Video-7B-DPO vicuna_v1 20 2 True /home/ubuntu/EgoSchema/videos_ex/3f61b913-9920-4fbd-ba0f-93f41c255279_croped.mp4

bash scripts/video/demo/video_demo.sh lmms-lab/LLaVA-NeXT-Video-7B vicuna_v1 20 2 True /home/ubuntu/EgoSchema/videos_ex/3f61b913-9920-4fbd-ba0f-93f41c255279_croped.mp4

====== missing parameters message ====== Some weights of the model checkpoint at lmms-lab/LLaVA-NeXT-Video-7B were not used when initializing LlavaLlamaForCausalLM: ['model.vision_tower.vision_tower.vision_model.embeddings.class_embedding', 'model.vision_tower.vision_tower.vision_model.embeddings.patch_embedding.weight', 'model.vision_tower.vision_tower.vision_model.embeddings.position_embedding.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.0.layer_norm1.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.0.layer_norm1.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.0.layer_norm2.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.0.layer_norm2.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.0.mlp.fc1.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.0.mlp.fc1.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.0.mlp.fc2.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.0.mlp.fc2.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.0.self_attn.k_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.0.self_attn.k_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.0.self_attn.out_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.0.self_attn.out_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.0.self_attn.q_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.0.self_attn.q_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.0.self_attn.v_proj.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.0.self_attn.v_proj.weight', 'model.vision_tower.vision_tower.vision_model.encoder.layers.1.layer_norm1.bias', 'model.vision_tower.vision_tower.vision_model.encoder.layers.1.layer_norm1.weight', 'model.vision

ZhangYuanhan-AI commented 2 weeks ago

Hi, just ignore this error.

jongwoopark7978 commented 2 weeks ago

Hi,

Thanks for your quick reply. Sorry but let me ask it in more detail.

I used inference branch to run the video video_demo.sh and obtained the error above. Can you explain briefly why I can ignore the warnings? It seems weights are not loaded for visual encoder.

Or do you think I should use video_inference branch to run the video_demo.sh? I also tried it but I get the error that I do not have a flash attn. I pasted the error below when I used inference_video branch.

The model was loaded with use_flash_attention_2=True, which is deprecated and may be removed in a future release. Please use attn_implementation="flash_attention_2" instead. Traceback (most recent call last): File "/efs/users/jongwp/LLaVA-NeXT/playground/demo/video_demo.py", line 181, in run_inference(args) File "/efs/users/jongwp/LLaVA-NeXT/playground/demo/video_demo.py", line 106, in run_inference tokenizer, model, image_processor, context_len = load_pretrained_model(args.model_path, args.model_base, model_name, load_8bit=args.load_8bit, overwrite_config=overwrite_config) File "/efs/users/jongwp/LLaVA-NeXT/llava/model/builder.py", line 125, in load_pretrained_model model = LlavaLlamaForCausalLM.from_pretrained(model_path, low_cpu_mem_usage=True, use_flash_attention_2=True, config=cfg_pretrained, **kwargs) File "/efs/users/jongwp/envs/llavaVideo/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3398, in from_pretrained config = cls._autoset_attn_implementation( File "/efs/users/jongwp/envs/llavaVideo/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1377, in _autoset_attn_implementation cls._check_and_enable_flash_attn_2( File "/efs/users/jongwp/envs/llavaVideo/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1469, in _check_and_enable_flash_attn_2 raise ImportError(f"{preface} the package flash_attn seems to be not installed. {install_message}") ImportError: FlashAttention2 has been toggled on, but it cannot be used due to the following error: the package flash_attn seems to be not installed. Please refer to the documentation of https://huggingface.co/docs/transformers/perf_infer_gpu_one#flashattention-2 to install Flash Attention 2.

ZhangYuanhan-AI commented 2 weeks ago

Hi, because we use delay_load=True mode to initialize our model (https://github.com/LLaVA-VL/LLaVA-NeXT/blob/6944062b9bb2e61c48436f1a65c3ea339095ec91/llavavid/model/llava_arch.py#L41). That means the vision_tower is not initialized when we load the ckpt, which raising this warning

Then the weight will be load later at: https://github.com/LLaVA-VL/LLaVA-NeXT/blob/6944062b9bb2e61c48436f1a65c3ea339095ec91/llavavid/model/builder.py#L162

jongwoopark7978 commented 2 weeks ago

Thanks for the detailed answers. I really appreciate it.

Sorry but one more question. Based on your answers, I need to run video_demo.sh at the inference branch. When do we use video_inference branch then?

ZhangYuanhan-AI commented 2 weeks ago

Thanks for the detailed answers. I really appreciate it.

Sorry but one more question. Based on your answers, I need to run video_demo.sh at the inference branch. When do we use video_inference branch then? You can see it as a dev branch, and will be delete soon