PKU-YuanGroup / Video-LLaVA

【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
https://arxiv.org/pdf/2311.10122.pdf
Apache License 2.0
2.89k stars 208 forks source link

Finetuning with LORA #99

Open joycech333 opened 8 months ago

joycech333 commented 8 months ago

After fine-tuning with LoRA using finetune_lora.sh (with no video, --bits 4, backbone and mm_mlp_adapter frozen), then loading the model with image_inference.py with the following arguments:

    model_path = "./models/checkpoints/videollava-7b-lora"
    model_base = "lmsys/vicuna-7b-v1.5"
    cache_dir = "./cache_dir"
    device = 'cuda'
    load_4bit, load_8bit = True, False
    model_name = get_model_name_from_path(model_path)
    tokenizer, model, processor, _ = load_pretrained_model(model_path, model_base, model_name, load_8bit, load_4bit=True, device=device, cache_dir=cache_dir)
    image_processor = processor['image']
    conv_mode = "llava_v1"
    conv = conv_templates[conv_mode].copy()
    roles = conv.roles

We get the warning:

Some weights of LlavaLlamaForCausalLM were not initialized from the model checkpoint at lmsys/vicuna-7b-v1.5 and are newly initialized: ['model.mm_projector.2.weight', 'model.mm_projector.0.weight', 'model.mm_projector.0.bias', 'model.mm_projector.2.bias']

Then after running the model on an image with image_inference.py, we get the following errors at line 40 "output_ids = model.generate(":

FP4 quantization state not initialized. Please call .cuda() or .to(device) on the LinearFP4 layer first. File ".../bitsandbytes/nn/modules.py", line 256, in forward out = bnb.matmul_4bit(x, self.weight.t(), bias=bias, quant_state=self.weight.quant_state) AttributeError: 'Parameter' object has no attribute 'quant_state'

However, adding model.cuda() leads to the following shape mismatch:

output = torch.nn.functional.linear(A, F.dequantize_4bit(B, quant_state).to(A.dtype).t(), bias) RuntimeError: mat1 and mat2 shapes cannot be multiplied (313x4096 and 1x8388608)

LinB203 commented 7 months ago

Sorry, we do not test --bits 4 on training, we just use it during inferencing.

Adrian0999 commented 6 months ago

same problem here, anyone can help?