OpenGVLab / Ask-Anything

[CVPR2024 Highlight][VideoChatGPT] ChatGPT with video understanding! And many more supported LMs such as miniGPT4, StableLM, and MOSS.
https://vchat.opengvlab.com/
MIT License
3.07k stars 252 forks source link

Meet problems when test the performance of stage2 models #243

Closed Qnancy closed 1 month ago

Qnancy commented 1 month ago

I want to test the performance of the videochat2_mistral on the dataset after viusl-language alignment in stage 2, so I set checkpoint_math to None, but the inference result contains repeated garbled characters.

The initialization code for the model is following demo_mistral.ipynb:

# load stage2 model
cfg.model.vision_encoder.num_frames = 4
model = VideoChat2_it_mistral(config=cfg.model)

# add lora to run stage3 model
peft_config = LoraConfig(
    task_type=TaskType.CAUSAL_LM, inference_mode=False, 
    r=16, lora_alpha=32, lora_dropout=0.,
    target_modules=[
        "q_proj", "k_proj", "v_proj", "o_proj",
         "gate_proj", "up_proj", "down_proj", "lm_head"
    ]
)
model.mistral_model = get_peft_model(model.mistral_model, peft_config)

# But there is no state_dict loading,the code is annotated
# state_dict = torch.load("your_model_path/videochat2/videochat2_mistral_7b_stage3.pth", "cpu")
# if 'model' in state_dict.keys():
#     msg = model.load_state_dict(state_dict['model'], strict=False)
# else:
#     msg = model.load_state_dict(state_dict, strict=False)
# print(msg)

But the result generated by inference contains repeated garbled characters as follows: c3a0c25d227bee258005e9ed2310fed

I found that there is no lora on the stage2 image, so I comment the code for adding lora:

# load stage2 model
cfg.model.vision_encoder.num_frames = 4
model = VideoChat2_it_mistral(config=cfg.model)

# add lora to run stage3 model
# peft_config = LoraConfig(
#   task_type=TaskType.CAUSAL_LM, inference_mode=False, 
#    r=16, lora_alpha=32, lora_dropout=0.,
#    target_modules=[
#        "q_proj", "k_proj", "v_proj", "o_proj",
#         "gate_proj", "up_proj", "down_proj", "lm_head"
#    ]
#)
# model.mistral_model = get_peft_model(model.mistral_model, peft_config)

# But there is no state_dict loading,the code is annotated
# state_dict = torch.load("your_model_path/videochat2/videochat2_mistral_7b_stage3.pth", "cpu")
# if 'model' in state_dict.keys():
#     msg = model.load_state_dict(state_dict['model'], strict=False)
# else:
#     msg = model.load_state_dict(state_dict, strict=False)
# print(msg)

But an error occurs when inferecing: 30c4c0dfaf97f2af82e5e6f67e0cf69

I don't know what went wrong here, I hope to get help. Thank you very much!

Qnancy commented 1 month ago

I successfully disabled Lora, but the inference answer still contains garbled text

yinanhe commented 1 month ago

Hi, as you asked in the WeChat group, the stage 2 model is recommended for captioning rather than QA tasks. After removing the lora module, it can generate video-to-text normally, but the conversational ability may decline as expected. If you have any questions, feel free to open this issue or continue asking in the WeChat group. Below is the WeChat for GV Assistant. 33076d7dad024a18ad0617b6e4ba0d5