After fine-tuning with LoRA using finetune_lora.sh (with no video, --bits 4, backbone and mm_mlp_adapter frozen), then loading the model with image_inference.py with the following arguments:
Some weights of LlavaLlamaForCausalLM were not initialized from the model checkpoint at lmsys/vicuna-7b-v1.5 and are newly initialized: ['model.mm_projector.2.weight', 'model.mm_projector.0.weight', 'model.mm_projector.0.bias', 'model.mm_projector.2.bias']
Then after running the model on an image with image_inference.py, we get the following errors at line 40 "output_ids = model.generate(":
FP4 quantization state not initialized. Please call .cuda() or .to(device) on the LinearFP4 layer first.
File ".../bitsandbytes/nn/modules.py", line 256, in forward
out = bnb.matmul_4bit(x, self.weight.t(), bias=bias, quant_state=self.weight.quant_state)
AttributeError: 'Parameter' object has no attribute 'quant_state'
However, adding model.cuda() leads to the following shape mismatch:
output = torch.nn.functional.linear(A, F.dequantize_4bit(B, quant_state).to(A.dtype).t(), bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (313x4096 and 1x8388608)
After fine-tuning with LoRA using finetune_lora.sh (with no video, --bits 4, backbone and mm_mlp_adapter frozen), then loading the model with image_inference.py with the following arguments:
We get the warning:
Then after running the model on an image with image_inference.py, we get the following errors at line 40 "output_ids = model.generate(":
However, adding model.cuda() leads to the following shape mismatch: