DAMO-NLP-SG / VideoLLaMA2

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
Apache License 2.0
717 stars 48 forks source link

Error while loading custom finetuned QLoRA model in 4 bit : size mismatch for model.mm_projector.readout.0.weight: copying a param with shape torch.Size([4096, 4096]) from checkpoint, the shape in current model is torch.Size([8388608, 1]). #71

Open ApoorvFrontera opened 1 month ago

ApoorvFrontera commented 1 month ago

Hi Team,

I have successfully finetuned a QLoRA adapter on a custom dataset. When I try to load it in full precision, it gets loaded and works well

But this takes too much time and GPU memory to run inference. So I wanted to load the model in 4-bit precision. So I pass in the load_4bit parameter:

model_path = '/home/apoorv/development/videollama2/VideoLLaMA2/work_dirs/videollama2_vllava/finetune_videollama2_vllava_qlora'
model_name = get_model_name_from_path(model_path)
tokenizer, model, processor, context_len = load_pretrained_model(model_path, None, model_name, load_4bit=True)

While running this I get the following error:

RuntimeError: Error(s) in loading state_dict for Videollama2MistralForCausalLM:
    size mismatch for model.mm_projector.readout.0.weight: copying a param with shape torch.Size([4096, 4096]) from checkpoint, the shape in current model is torch.Size([8388608, 1]).
    size mismatch for model.mm_projector.readout.2.weight: copying a param with shape torch.Size([4096, 4096]) from checkpoint, the shape in current model is torch.Size([8388608, 1]).

I have debugged and understood the reason that when we pass in the load_4bit=True, the VideoLLama2 model params are loaded as 4bit params. The LLM part weights gets initialized from the base model (in this case mistralai/Mistral-7B-Instruct-v0.2) in 4-bit but the mm_projector does not (I am guessing it is initialized from the random values but in Params4bit class).

Line: https://github.com/DAMO-NLP-SG/VideoLLaMA2/blob/main/videollama2/model/builder.py#L76 model = Videollama2MistralForCausalLM.from_pretrained(model_base, low_cpu_mem_usage=True, config=lora_cfg_pretrained, **kwargs)

Now to initialize the weights for the mm_projector, we try to load it from the previously saved non_lora_trainables.bin which was stored in full precision format.

Line: https://github.com/DAMO-NLP-SG/VideoLLaMA2/blob/main/videollama2/model/builder.py#L101 model.load_state_dict(non_lora_trainables, strict=False)

Debugging Outputs

Model params

**(Pdb) model.model.mm_projector.readout**
Sequential(
  (0): Linear4bit(in_features=4096, out_features=4096, bias=True)
  (1): GELU(approximate='none')
  (2): Linear4bit(in_features=4096, out_features=4096, bias=True)
)
**(Pdb) model.model.mm_projector.readout[0]**
Linear4bit(in_features=4096, out_features=4096, bias=True)
**(Pdb) model.model.mm_projector.readout[0].weight**
Parameter containing:
Parameter(Params4bit([[193],
            [108],
            [250],
            ...,
            [231],
            [107],
            [ 92]], device='cuda:1', dtype=torch.uint8))
**(Pdb) model.model.mm_projector.readout[0].weight.shape**
torch.Size([8388608, 1])

Previously saved non_lora_trainables param

(Pdb) non_lora_trainables[ 'model.model.mm_projector.readout.0.weight'].shape
torch.Size([4096, 4096])

Please advise on how to resolve this.

ApoorvFrontera commented 3 weeks ago

Hi Team,

Any help is highly appreciated.

Thanks :)

clownrat6 commented 2 weeks ago

You can check this issue https://github.com/DAMO-NLP-SG/VideoLLaMA2/issues/78