Issue with Loading 13B Model: Size Mismatch Error

Hi,

Thank you for the great work and the detailed documentation you have provided. It's been very helpful.

I'm trying to use the 13B model instead of the default 7B model. I downloaded the 13B model from the provided Dropbox folder and attempted to replace the modeling_llama.py file with the one from the Dropbox folder. However, it appears that they are identical to the default one in the repository.

When I try to load the 13B model, I get the following error message:

You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
<class 'peft.tuners.lora.LoraModel'>
Traceback (most recent call last):
  File "ltu_main/src/ltu/train_script/../finetune_low_resource.py", line 285, in <module>
    fire.Fire(train)
  File "conda/envs/venv_ltu/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "conda/envs/venv_ltu/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "conda/envs/venv_ltu/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "code/forks/ltu_main/src/ltu/train_script/../finetune_low_resource.py", line 235, in train
    msg = model.load_state_dict(state_dict, strict=False)
  File "conda/envs/venv_ltu/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1671, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM:
    size mismatch for base_model.model.model.layers.0.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]).

It seems like there is a size mismatch error when loading the state_dict for the PeftModelForCausalLM. The error message indicates that the shape of the parameter in the checkpoint is different from the shape in the current model.

Could you please guide me on how to correctly load the 13B model? Any help would be greatly appreciated.

Thank you.

Detailed logs:

``` Training Alpaca-LoRA model with params: base_model: ./LTU_13B/stage4_all_mix_long_seq/checkpoint-20000/pytorch_model.bin output_dir: ../exp/ltu_ft_toy_low_resource_2024-04-06_23-32-44/ batch_size: 256 micro_batch_size: 1 num_epochs: 1000 learning_rate: 0.0001 cutoff_len: 196 val_set_size: 0 lora_r: 8 lora_alpha: 16 lora_dropout: 0.05 lora_target_modules: ['q_proj', 'v_proj'] train_on_inputs: True add_eos_token: False group_by_length: True wandb_project: ltu wandb_run_name: ../exp/ltu_ft_toy_low_resource_2024-04-06_23-32-44/ wandb_watch: false wandb_log_model: false resume_from_checkpoint: False prompt template: alpaca_short Will load from ./LTU_13B/stage4_all_mix_long_seq/checkpoint-20000/pytorch_model.bin later, for implementation purpose, first load from ../../../pretrained_mdls/vicuna_ltu/ Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:12<00:00, 4.14s/it] Some weights of the model checkpoint at ../../../pretrained_mdls/vicuna_ltu/ were not used when initializing LlamaForCausalLM: ['model.audio_encoder.blocks_v.5.norm2.weight', 'model.audio_encoder.blocks_a.9.attn.qkv.weight', 'model.audio_encoder.blocks_a.3.norm2.bias', 'model.audio_encoder.blocks_v.6.norm1_a.bias', 'model.audio_encoder.mlp_head.1.weight', 'model.audio_encoder.blocks_a.1.attn.qkv.bias', 'model.audio_encoder.mlp_head.0.bias', 'model.audio_encoder.blocks_a.4.norm1_v.weight', 'model.audio_encoder.blocks_a.10.norm2_v.weight', 'model.audio_encoder.blocks_v.10.attn.proj.weight', 'model.audio_encoder.blocks_v.4.norm1.bias', 'model.audio_encoder.blocks_v.0.norm2.bias', 'model.audio_encoder.blocks_v.1.mlp.fc1.bias', 'model.audio_encoder.blocks_a.9.norm1_a.weight', 'model.audio_encoder.blocks_a.6.norm2.bias', 'model.audio_encoder.blocks_a.6.attn.proj.weight', 'model.audio_encoder.blocks_a.6.norm1_v.weight', 'model.audio_encoder.patch_embed_a.proj.weight', 'model.audio_encoder.blocks_a.7.norm2_a.bias', 'model.audio_encoder.blocks_v.1.mlp.fc2.weight', 'model.audio_encoder.blocks_a.5.norm2_a.weight', 'model.audio_encoder.blocks_v.6.attn.qkv.bias', 'model.audio_encoder.blocks_v.1.norm2_a.bias', 'model.audio_encoder.blocks_a.4.norm2_a.bias', 'model.audio_encoder.blocks_a.0.norm2_v.weight', 'model.audio_encoder.blocks_a.5.mlp.fc1.bias', 'model.audio_encoder.blocks_a.6.attn.qkv.bias', 'model.audio_encoder.blocks_a.6.norm2_v.bias', 'model.audio_encoder.blocks_v.8.norm2.weight', 'model.audio_encoder.blocks_v.1.attn.qkv.weight', 'model.audio_encoder.blocks_v.2.mlp.fc2.bias', 'model.audio_encoder.blocks_a.8.norm2.bias', 'model.audio_encoder.blocks_u.0.norm1_v.bias', 'model.audio_encoder.blocks_a.5.attn.qkv.bias', 'model.audio_encoder.blocks_v.10.attn.qkv.weight', 'model.audio_encoder.blocks_a.2.norm1_a.weight', 'model.audio_encoder.blocks_a.9.norm1.bias', 'model.audio_encoder.blocks_a.3.mlp.fc2.weight', 'model.audio_encoder.blocks_a.3.attn.proj.weight', 'model.audio_encoder.blocks_a.9.attn.proj.bias', 'model.audio_encoder.blocks_v.0.norm1.weight', 'model.audio_encoder.blocks_a.5.attn.proj.weight', 'model.audio_encoder.blocks_v.1.norm2_v.weight', 'model.audio_encoder.modality_a', 'model.audio_encoder.blocks_u.0.mlp.fc2.bias', 'model.audio_encoder.blocks_v.7.mlp.fc2.bias', 'model.audio_encoder.blocks_v.2.norm2.bias', 'model.audio_encoder.blocks_v.3.attn.proj.weight', 'model.audio_encoder.blocks_u.0.norm2_a.bias', 'model.audio_encoder.blocks_v.6.norm2.weight', 'model.audio_encoder.blocks_v.9.norm1_a.weight', 'model.audio_encoder.blocks_a.4.mlp.fc2.bias', 'model.audio_encoder.blocks_a.2.attn.proj.bias', 'model.audio_encoder.blocks_v.5.attn.qkv.weight', 'model.audio_encoder.blocks_u.0.attn.qkv.weight', 'model.audio_encoder.blocks_v.0.norm1.bias', 'model.audio_encoder.blocks_v.3.mlp.fc1.weight', 'model.audio_encoder.blocks_v.5.attn.proj.weight', 'model.audio_encoder.blocks_a.9.norm1_v.weight', 'model.audio_encoder.norm_a.bias', 'model.audio_encoder.blocks_v.5.attn.qkv.bias', 'model.audio_encoder.blocks_u.0.norm2.bias', 'model.audio_encoder.blocks_v.0.norm1_a.weight', 'model.audio_encoder.blocks_v.7.attn.qkv.bias', 'model.audio_encoder.blocks_a.8.norm1.weight', 'model.audio_encoder.blocks_a.1.norm1_v.bias', 'model.audio_encoder.blocks_a.8.norm1_v.bias', 'model.audio_encoder.blocks_v.5.attn.proj.bias', 'model.audio_encoder.norm_v.bias', 'model.audio_encoder.blocks_v.6.norm2_a.weight', 'model.audio_encoder.blocks_a.0.norm1_v.weight', 'model.audio_encoder.blocks_v.7.norm2_a.weight', 'model.audio_encoder.blocks_v.6.mlp.fc1.weight', 'model.audio_encoder.blocks_a.8.attn.proj.bias', 'model.audio_encoder.blocks_a.0.norm2_a.weight', 'model.audio_encoder.blocks_u.0.mlp.fc1.bias', 'model.audio_encoder.blocks_a.4.norm2_v.bias', 'model.audio_encoder.blocks_v.9.attn.proj.weight', 'model.audio_encoder.blocks_v.2.norm2_v.weight', 'model.audio_encoder.blocks_v.9.norm2.bias', 'model.audio_encoder.blocks_v.2.mlp.fc1.bias', 'model.audio_encoder.blocks_v.4.norm2_a.weight', 'model.audio_encoder.blocks_v.8.norm1_v.bias', 'model.audio_encoder.blocks_a.3.norm2_a.bias', 'model.audio_encoder.blocks_v.6.norm2.bias', 'model.audio_encoder.blocks_a.6.attn.proj.bias', 'model.audio_encoder.blocks_v.2.norm1_v.weight', 'model.audio_encoder.blocks_v.6.norm1.weight', 'model.audio_encoder.blocks_a.0.attn.qkv.weight', 'model.audio_encoder.blocks_u.0.norm1_v.weight', 'model.audio_encoder.blocks_v.8.mlp.fc2.bias', 'model.audio_encoder.blocks_v.8.attn.proj.weight', 'model.audio_encoder.blocks_a.1.attn.proj.bias', 'model.audio_encoder.blocks_a.8.mlp.fc1.bias', 'model.audio_encoder.blocks_a.10.norm2_a.weight', 'model.audio_encoder.blocks_v.1.attn.qkv.bias', 'model.audio_encoder.blocks_v.4.mlp.fc2.bias', 'model.audio_encoder.blocks_v.8.norm1.bias', 'model.audio_encoder.blocks_u.0.norm2_a.weight', 'model.audio_encoder.blocks_v.9.mlp.fc1.weight', 'model.audio_encoder.blocks_a.4.mlp.fc1.weight', 'model.audio_encoder.blocks_v.9.attn.qkv.bias', 'model.audio_encoder.blocks_v.1.norm2.weight', 'model.audio_encoder.blocks_v.0.mlp.fc2.bias', 'model.audio_encoder.blocks_a.4.attn.qkv.bias', 'model.audio_encoder.blocks_a.0.mlp.fc1.weight', 'model.audio_encoder.modality_v', 'model.audio_encoder.blocks_v.8.norm1.weight', 'model.audio_encoder.blocks_a.2.norm2_a.bias', 'model.audio_encoder.blocks_a.4.norm1.bias', 'model.audio_encoder.blocks_a.8.norm2_a.bias', 'model.audio_encoder.blocks_a.9.norm1_a.bias', 'model.audio_encoder.blocks_v.3.norm1.weight', 'model.audio_encoder.blocks_v.9.norm1_v.bias', 'model.audio_encoder.blocks_v.5.norm1_v.weight', 'model.audio_encoder.blocks_v.10.norm1_a.bias', 'model.audio_encoder.blocks_v.8.norm2_v.weight', 'model.audio_encoder.blocks_a.7.norm1.bias', 'model.audio_encoder.blocks_a.2.norm2_a.weight', 'model.audio_encoder.blocks_v.5.norm1.weight', 'model.audio_encoder.blocks_v.7.mlp.fc2.weight', 'model.audio_encoder.blocks_v.10.mlp.fc2.bias', 'model.audio_encoder.blocks_a.6.norm2_a.weight', 'model.audio_encoder.blocks_a.0.norm2_v.bias', 'model.audio_encoder.blocks_a.6.mlp.fc2.bias', 'model.audio_encoder.blocks_a.4.norm2_v.weight', 'model.audio_encoder.blocks_a.9.norm2.bias', 'model.audio_encoder.blocks_v.8.attn.qkv.bias', 'model.audio_encoder.blocks_a.1.norm2_v.bias', 'model.audio_encoder.blocks_v.2.attn.qkv.bias', 'model.audio_encoder.blocks_a.8.norm2_a.weight', 'model.audio_encoder.blocks_a.1.norm2_a.weight', 'model.audio_encoder.blocks_a.6.mlp.fc2.weight', 'model.audio_encoder.blocks_a.3.mlp.fc1.bias', 'model.audio_encoder.blocks_v.2.norm1.weight', 'model.audio_encoder.mlp_head.0.weight', 'model.audio_encoder.blocks_a.0.mlp.fc1.bias', 'model.audio_encoder.blocks_a.3.mlp.fc2.bias', 'model.audio_encoder.blocks_a.5.mlp.fc1.weight', 'model.audio_encoder.blocks_a.7.norm1_v.weight', 'model.audio_encoder.blocks_a.3.norm1.weight', 'model.audio_encoder.blocks_a.0.mlp.fc2.bias', 'model.audio_encoder.blocks_a.2.norm2_v.bias', 'model.audio_encoder.blocks_a.4.attn.proj.bias', 'model.audio_encoder.blocks_v.3.norm2_a.weight', 'model.audio_encoder.blocks_a.6.norm1.weight', 'model.audio_encoder.blocks_a.9.norm1_v.bias', 'model.audio_encoder.blocks_a.10.norm1_v.weight', 'model.audio_encoder.blocks_v.9.norm2_a.weight', 'model.audio_encoder.blocks_v.1.norm2_v.bias', 'model.audio_encoder.blocks_v.2.norm2_a.bias', 'model.audio_encoder.blocks_a.7.norm2.weight', 'model.audio_encoder.blocks_a.10.norm1_a.bias', 'model.audio_encoder.blocks_v.6.mlp.fc1.bias', 'model.audio_encoder.blocks_a.4.norm2_a.weight', 'model.audio_encoder.blocks_v.2.mlp.fc2.weight', 'model.audio_encoder.blocks_v.7.norm2_a.bias', 'model.audio_encoder.blocks_v.4.attn.qkv.bias', 'model.audio_encoder.blocks_v.0.norm1_v.bias', 'model.audio_encoder.blocks_a.1.attn.proj.weight', 'model.audio_encoder.blocks_a.1.norm1_a.weight', 'model.audio_encoder.blocks_u.0.attn.qkv.bias', 'model.audio_encoder.blocks_v.5.norm2.bias', 'model.audio_encoder.blocks_a.8.attn.qkv.weight', 'model.audio_encoder.blocks_a.5.attn.proj.bias', 'model.audio_encoder.blocks_v.1.norm1.bias', 'model.audio_encoder.blocks_a.2.attn.proj.weight', 'model.audio_encoder.blocks_a.1.norm1_v.weight', 'model.audio_encoder.blocks_a.9.norm2_v.weight', 'model.audio_encoder.blocks_a.7.mlp.fc2.weight', 'model.audio_encoder.blocks_a.4.norm1_a.weight', 'model.audio_encoder.blocks_a.4.mlp.fc1.bias', 'model.audio_encoder.blocks_v.8.mlp.fc1.bias', 'model.audio_encoder.blocks_a.10.norm2_a.bias', 'model.audio_encoder.blocks_a.10.norm1_a.weight', 'model.audio_encoder.blocks_v.4.attn.proj.bias', 'model.audio_encoder.blocks_v.1.norm1_a.bias', 'model.audio_encoder.blocks_v.10.attn.proj.bias', 'model.audio_encoder.blocks_v.8.mlp.fc2.weight', 'model.audio_encoder.blocks_v.2.norm2_a.weight', 'model.audio_encoder.norm.bias', 'model.audio_encoder.blocks_a.0.norm2_a.bias', 'model.audio_encoder.blocks_v.5.mlp.fc2.weight', 'model.audio_encoder.blocks_v.3.attn.qkv.weight', 'model.audio_encoder.blocks_v.0.attn.proj.weight', 'model.audio_encoder.blocks_v.4.norm1_a.bias', 'model.audio_encoder.blocks_a.0.attn.proj.weight', 'model.audio_encoder.blocks_a.0.norm2.weight', 'model.audio_encoder.blocks_v.1.mlp.fc2.bias', 'model.audio_encoder.blocks_v.2.norm1_a.bias', 'model.audio_encoder.blocks_v.3.norm1_a.weight', 'model.audio_encoder.blocks_a.2.mlp.fc1.bias', 'model.audio_encoder.blocks_v.8.norm2_v.bias', 'model.audio_encoder.blocks_a.7.norm1_a.weight', 'model.audio_encoder.blocks_a.9.norm1.weight', 'model.audio_encoder.blocks_a.2.norm1.weight', 'model.audio_encoder.blocks_v.10.norm1_v.weight', 'model.audio_encoder.blocks_a.3.norm1.bias', 'model.audio_encoder.blocks_a.4.norm1_v.bias', 'model.audio_encoder.blocks_v.3.norm1_v.weight', 'model.audio_encoder.blocks_v.3.mlp.fc2.bias', 'model.audio_encoder.blocks_a.7.mlp.fc1.bias', 'model.audio_encoder.blocks_a.5.norm2_v.bias', 'model.audio_encoder.blocks_v.2.attn.qkv.weight', 'model.audio_encoder.blocks_v.3.attn.qkv.bias', 'model.audio_encoder.blocks_a.2.attn.qkv.bias', 'model.audio_encoder.blocks_a.8.mlp.fc2.bias', 'model.audio_encoder.blocks_a.10.attn.qkv.weight', 'model.audio_encoder.blocks_a.2.norm2.bias', 'model.audio_encoder.blocks_v.9.norm2_v.bias', 'model.audio_encoder.blocks_a.5.norm2_a.bias', 'model.audio_encoder.blocks_v.10.attn.qkv.bias', 'model.audio_encoder.blocks_v.10.norm2_a.weight', 'model.audio_encoder.pos_embed_a', 'model.audio_encoder.blocks_a.4.mlp.fc2.weight', 'model.audio_encoder.blocks_v.9.attn.proj.bias', 'model.audio_encoder.blocks_v.6.mlp.fc2.bias', 'model.audio_encoder.blocks_a.2.norm2.weight', 'model.audio_encoder.blocks_v.4.norm1_a.weight', 'model.audio_encoder.blocks_a.6.norm2.weight', 'model.audio_encoder.blocks_a.8.norm1_v.weight', 'model.audio_encoder.blocks_v.6.norm2_v.weight', 'model.audio_encoder.blocks_v.0.norm1_a.bias', 'model.audio_encoder.blocks_a.5.norm1_v.weight', 'model.audio_encoder.blocks_v.4.mlp.fc2.weight', 'model.audio_encoder.pos_embed_v', 'model.audio_encoder.blocks_a.7.attn.qkv.bias', 'model.audio_encoder.blocks_a.0.attn.proj.bias', 'model.audio_encoder.blocks_v.9.norm2.weight', 'model.audio_encoder.blocks_a.8.norm2_v.bias', 'model.audio_encoder.blocks_v.4.norm1_v.bias', 'model.audio_encoder.blocks_a.9.mlp.fc2.bias', 'model.audio_encoder.blocks_v.3.mlp.fc2.weight', 'model.audio_encoder.blocks_v.8.norm2_a.weight', 'model.audio_encoder.blocks_a.9.attn.proj.weight', 'model.audio_encoder.blocks_v.5.norm1_v.bias', 'model.audio_encoder.blocks_v.7.mlp.fc1.bias', 'model.audio_encoder.blocks_v.0.attn.proj.bias', 'model.audio_encoder.blocks_a.3.norm1_v.bias', 'model.audio_encoder.blocks_a.5.norm2.weight', 'model.audio_encoder.blocks_a.6.mlp.fc1.weight', 'model.audio_encoder.blocks_u.0.mlp.fc1.weight', 'model.audio_encoder.blocks_v.10.norm2_a.bias', 'model.audio_encoder.blocks_v.9.norm2_a.bias', 'model.audio_encoder.blocks_a.3.norm2_a.weight', 'model.audio_encoder.blocks_a.9.mlp.fc2.weight', 'model.audio_encoder.blocks_a.5.mlp.fc2.bias', 'model.audio_encoder.blocks_a.6.norm2_v.weight', 'model.audio_encoder.blocks_v.3.attn.proj.bias', 'model.audio_encoder.blocks_a.3.norm2_v.bias', 'model.audio_encoder.blocks_v.5.mlp.fc1.weight', 'model.audio_encoder.blocks_a.0.mlp.fc2.weight', 'model.audio_encoder.blocks_v.10.norm2.weight', 'model.audio_encoder.blocks_v.7.norm1_v.bias', 'model.audio_encoder.blocks_v.7.norm2.weight', 'model.audio_encoder.blocks_a.10.attn.proj.bias', 'model.audio_encoder.blocks_v.6.attn.proj.bias', 'model.audio_encoder.blocks_a.8.attn.qkv.bias', 'model.audio_encoder.blocks_v.1.attn.proj.weight', 'model.audio_encoder.blocks_u.0.norm2_v.bias', 'model.audio_encoder.blocks_v.8.norm1_v.weight', 'model.audio_encoder.blocks_a.9.mlp.fc1.weight', 'model.audio_encoder.blocks_v.1.norm2_a.weight', 'model.audio_encoder.blocks_a.10.mlp.fc1.weight', 'model.audio_encoder.blocks_a.7.norm1.weight', 'model.audio_encoder.blocks_v.6.norm1_v.weight', 'model.audio_encoder.blocks_v.5.norm1.bias', 'model.audio_encoder.blocks_a.7.attn.proj.weight', 'model.audio_encoder.blocks_a.10.norm1.weight', 'model.audio_encoder.blocks_v.0.norm2_a.weight', 'model.audio_encoder.blocks_v.9.norm1_a.bias', 'model.audio_encoder.blocks_v.8.attn.proj.bias', 'model.audio_encoder.blocks_a.5.norm2_v.weight', 'model.audio_encoder.blocks_u.0.norm1.weight', 'model.audio_encoder.blocks_a.2.mlp.fc2.bias', 'model.audio_encoder.blocks_a.1.norm2.bias', 'model.audio_encoder.blocks_v.0.mlp.fc2.weight', 'model.audio_encoder.blocks_a.4.attn.qkv.weight', 'model.audio_encoder.blocks_u.0.norm1_a.weight', 'model.audio_encoder.blocks_a.1.mlp.fc2.weight', 'model.audio_encoder.blocks_a.5.norm1_v.bias', 'model.audio_encoder.blocks_a.4.norm2.weight', 'model.audio_encoder.blocks_v.7.mlp.fc1.weight', 'model.audio_encoder.blocks_a.7.attn.proj.bias', 'model.audio_encoder.blocks_v.0.mlp.fc1.bias', 'model.audio_encoder.blocks_a.2.norm1_v.bias', 'model.audio_encoder.blocks_v.4.mlp.fc1.bias', 'model.audio_encoder.blocks_a.1.norm1_a.bias', 'model.audio_encoder.norm_a.weight', 'model.audio_encoder.blocks_a.0.norm1_a.bias', 'model.audio_encoder.blocks_a.5.attn.qkv.weight', 'model.audio_encoder.blocks_v.2.norm2_v.bias', 'model.audio_encoder.blocks_a.2.mlp.fc1.weight', 'model.audio_encoder.blocks_v.10.norm1_a.weight', 'model.audio_encoder.patch_embed_v.proj.bias', 'model.audio_encoder.blocks_a.5.norm1_a.weight', 'model.audio_encoder.blocks_u.0.norm2_v.weight', 'model.audio_encoder.blocks_v.3.norm2_a.bias', 'model.audio_encoder.blocks_v.6.norm1_a.weight', 'model.audio_encoder.blocks_a.8.attn.proj.weight', 'model.audio_encoder.blocks_a.6.norm1_a.weight', 'model.audio_encoder.blocks_v.10.mlp.fc1.weight', 'model.audio_encoder.blocks_v.9.mlp.fc2.bias', 'model.audio_encoder.blocks_v.0.norm2_a.bias', 'model.audio_encoder.blocks_a.0.norm1_a.weight', 'model.audio_encoder.blocks_a.7.norm2.bias', 'model.audio_encoder.blocks_v.10.norm2.bias', 'model.audio_encoder.blocks_a.8.mlp.fc1.weight', 'model.audio_encoder.blocks_v.3.norm2_v.bias', 'model.audio_encoder.blocks_a.6.norm1_a.bias', 'model.audio_encoder.blocks_a.8.norm1_a.weight', 'model.audio_encoder.blocks_a.0.norm2.bias', 'model.audio_encoder.blocks_a.6.mlp.fc1.bias', 'model.audio_encoder.blocks_a.1.norm2.weight', 'model.audio_encoder.blocks_a.6.attn.qkv.weight', 'model.audio_encoder.blocks_v.1.norm2.bias', 'model.audio_encoder.blocks_v.0.attn.qkv.bias', 'model.audio_encoder.blocks_v.8.norm1_a.weight', 'model.audio_encoder.blocks_v.7.norm2_v.bias', 'model.audio_encoder.blocks_a.2.norm1_v.weight', 'model.audio_encoder.blocks_a.3.norm1_a.weight', 'model.audio_encoder.blocks_a.10.norm2.bias', 'model.audio_encoder.blocks_a.7.mlp.fc2.bias', 'model.audio_encoder.mlp_head.1.bias', 'model.audio_encoder.blocks_v.4.norm2_v.weight', 'model.audio_encoder.blocks_a.1.attn.qkv.weight', 'model.audio_encoder.blocks_a.9.mlp.fc1.bias', 'model.audio_encoder.blocks_a.6.norm1.bias', 'model.audio_encoder.blocks_v.9.mlp.fc1.bias', 'model.audio_encoder.patch_embed_a.proj.bias', 'model.audio_encoder.patch_embed_v.proj.weight', 'model.audio_encoder.blocks_a.2.mlp.fc2.weight', 'model.audio_encoder.blocks_a.4.attn.proj.weight', 'model.audio_encoder.blocks_v.5.norm2_v.weight', 'model.audio_encoder.blocks_a.6.norm1_v.bias', 'model.audio_encoder.blocks_v.10.norm1.weight', 'model.audio_encoder.blocks_u.0.norm1_a.bias', 'model.audio_encoder.blocks_v.4.norm1.weight', 'model.audio_encoder.blocks_a.0.norm1_v.bias', 'model.audio_encoder.blocks_v.7.attn.proj.weight', 'model.audio_encoder.blocks_v.4.norm1_v.weight', 'model.audio_encoder.blocks_v.7.attn.proj.bias', 'model.audio_encoder.blocks_a.1.norm2_a.bias', 'model.audio_encoder.blocks_a.7.norm2_v.bias', 'model.audio_encoder.blocks_v.3.norm2.weight', 'model.audio_encoder.blocks_v.4.norm2_a.bias', 'model.audio_encoder.blocks_v.2.mlp.fc1.weight', 'model.audio_encoder.blocks_a.3.norm2.weight', 'model.audio_encoder.blocks_v.1.norm1_v.bias', 'model.audio_encoder.blocks_v.2.attn.proj.bias', 'model.audio_encoder.blocks_v.8.norm2.bias', 'model.audio_encoder.blocks_u.0.norm2.weight', 'model.audio_encoder.blocks_v.10.norm1.bias', 'model.audio_encoder.blocks_a.10.mlp.fc2.bias', 'model.audio_encoder.blocks_a.5.norm1.weight', 'model.audio_encoder.blocks_v.9.norm2_v.weight', 'model.audio_encoder.blocks_v.4.norm2_v.bias', 'model.audio_encoder.blocks_v.7.norm1.bias', 'model.audio_encoder.blocks_v.0.attn.qkv.weight', 'model.audio_encoder.blocks_a.2.norm1.bias', 'model.audio_encoder.blocks_v.0.norm2_v.weight', 'model.audio_encoder.blocks_a.4.norm2.bias', 'model.audio_encoder.blocks_a.6.norm2_a.bias', 'model.audio_encoder.blocks_v.2.norm1.bias', 'model.audio_encoder.blocks_a.9.norm2_a.weight', 'model.audio_encoder.blocks_a.10.norm2.weight', 'model.audio_encoder.blocks_v.7.norm1_a.weight', 'model.audio_encoder.blocks_v.6.attn.qkv.weight', 'model.audio_encoder.blocks_a.10.mlp.fc1.bias', 'model.audio_encoder.blocks_v.0.norm2.weight', 'model.audio_encoder.blocks_v.6.norm2_v.bias', 'model.audio_encoder.blocks_u.0.norm1.bias', 'model.audio_encoder.blocks_a.9.attn.qkv.bias', 'model.audio_encoder.blocks_v.4.attn.qkv.weight', 'model.audio_encoder.blocks_v.5.mlp.fc1.bias', 'model.audio_encoder.blocks_a.4.norm1_a.bias', 'model.audio_encoder.blocks_a.2.norm2_v.weight', 'model.audio_encoder.blocks_v.3.mlp.fc1.bias', 'model.audio_encoder.blocks_v.6.norm2_a.bias', 'model.audio_encoder.blocks_v.4.norm2.weight', 'model.audio_encoder.blocks_v.10.mlp.fc2.weight', 'model.audio_encoder.blocks_a.1.norm1.weight', 'model.audio_encoder.blocks_v.3.norm1.bias', 'model.audio_encoder.blocks_v.8.norm1_a.bias', 'model.audio_encoder.blocks_a.8.norm1.bias', 'model.audio_encoder.blocks_a.0.norm1.weight', 'model.audio_encoder.blocks_a.8.norm1_a.bias', 'model.audio_encoder.blocks_a.10.attn.qkv.bias', 'model.audio_encoder.blocks_u.0.attn.proj.bias', 'model.audio_encoder.blocks_v.7.norm1_v.weight', 'model.audio_encoder.blocks_v.3.norm1_v.bias', 'model.audio_encoder.blocks_a.8.norm2_v.weight', 'model.audio_encoder.blocks_a.7.norm1_a.bias', 'model.audio_encoder.blocks_v.1.norm1_a.weight', 'model.audio_encoder.blocks_v.3.norm2.bias', 'model.audio_encoder.blocks_a.3.attn.qkv.bias', 'model.audio_encoder.blocks_a.10.norm2_v.bias', 'model.audio_encoder.blocks_v.10.mlp.fc1.bias', 'model.audio_encoder.blocks_a.8.norm2.weight', 'model.audio_encoder.blocks_a.10.norm1.bias', 'model.audio_encoder.blocks_v.9.norm1_v.weight', 'model.audio_encoder.blocks_a.3.norm2_v.weight', 'model.audio_encoder.blocks_v.7.norm1_a.bias', 'model.audio_encoder.blocks_a.0.attn.qkv.bias', 'model.audio_encoder.blocks_a.3.attn.qkv.weight', 'model.audio_encoder.blocks_a.1.mlp.fc2.bias', 'model.audio_encoder.norm_v.weight', 'model.audio_encoder.blocks_v.3.norm2_v.weight', 'model.audio_encoder.blocks_v.5.norm2_v.bias', 'model.audio_encoder.blocks_v.9.norm1.bias', 'model.audio_encoder.blocks_a.9.norm2_a.bias', 'model.audio_encoder.blocks_a.5.mlp.fc2.weight', 'model.audio_encoder.blocks_a.7.norm2_a.weight', 'model.audio_encoder.blocks_a.5.norm1.bias', 'model.audio_encoder.blocks_v.3.norm1_a.bias', 'model.audio_encoder.blocks_v.6.norm1_v.bias', 'model.audio_encoder.blocks_v.7.norm2_v.weight', 'model.audio_encoder.blocks_v.0.norm1_v.weight', 'model.audio_encoder.blocks_a.8.mlp.fc2.weight', 'model.audio_encoder.blocks_v.0.mlp.fc1.weight', 'model.audio_encoder.blocks_v.5.norm2_a.weight', 'model.audio_encoder.blocks_a.7.norm2_v.weight', 'model.audio_encoder.blocks_v.10.norm2_v.weight', 'model.audio_encoder.blocks_v.2.attn.proj.weight', 'model.audio_encoder.blocks_v.0.norm2_v.bias', 'model.audio_encoder.blocks_a.4.norm1.weight', 'model.audio_encoder.blocks_v.7.norm1.weight', 'model.audio_encoder.blocks_v.4.attn.proj.weight', 'model.audio_encoder.blocks_a.3.attn.proj.bias', 'model.audio_encoder.blocks_a.2.norm1_a.bias', 'model.audio_encoder.blocks_v.4.norm2.bias', 'model.audio_encoder.blocks_v.2.norm1_a.weight', 'model.audio_encoder.blocks_v.2.norm2.weight', 'model.audio_encoder.blocks_a.2.attn.qkv.weight', 'model.audio_encoder.blocks_v.6.mlp.fc2.weight', 'model.audio_encoder.blocks_v.8.attn.qkv.weight', 'model.audio_encoder.blocks_a.9.norm2_v.bias', 'model.audio_encoder.blocks_v.10.norm1_v.bias', 'model.audio_encoder.blocks_a.0.norm1.bias', 'model.audio_encoder.blocks_v.10.norm2_v.bias', 'model.audio_encoder.blocks_v.1.mlp.fc1.weight', 'model.audio_encoder.blocks_v.5.mlp.fc2.bias', 'model.audio_encoder.blocks_a.3.mlp.fc1.weight', 'model.audio_encoder.blocks_a.1.mlp.fc1.bias', 'model.audio_encoder.blocks_a.10.mlp.fc2.weight', 'model.audio_encoder.blocks_a.9.norm2.weight', 'model.audio_encoder.blocks_a.1.mlp.fc1.weight', 'model.audio_encoder.blocks_a.3.norm1_v.weight', 'model.audio_encoder.blocks_v.8.mlp.fc1.weight', 'model.audio_encoder.blocks_v.1.attn.proj.bias', 'model.audio_encoder.blocks_u.0.mlp.fc2.weight', 'model.audio_encoder.blocks_a.1.norm2_v.weight', 'model.audio_encoder.blocks_a.10.norm1_v.bias', 'model.audio_encoder.blocks_v.5.norm2_a.bias', 'model.audio_encoder.blocks_v.2.norm1_v.bias', 'model.audio_encoder.blocks_v.8.norm2_a.bias', 'model.audio_encoder.blocks_u.0.attn.proj.weight', 'model.audio_encoder.blocks_a.7.mlp.fc1.weight', 'model.audio_encoder.blocks_a.1.norm1.bias', 'model.audio_encoder.blocks_v.9.norm1.weight', 'model.audio_encoder.blocks_a.7.attn.qkv.weight', 'model.audio_encoder.blocks_a.5.norm1_a.bias', 'model.audio_encoder.norm.weight', 'model.audio_encoder.blocks_a.3.norm1_a.bias', 'model.audio_encoder.blocks_v.1.norm1.weight', 'model.audio_encoder.blocks_v.6.attn.proj.weight', 'model.audio_encoder.blocks_v.1.norm1_v.weight', 'model.audio_encoder.blocks_v.4.mlp.fc1.weight', 'model.audio_encoder.blocks_a.7.norm1_v.bias', 'model.audio_encoder.blocks_a.10.attn.proj.weight', 'model.audio_encoder.blocks_v.9.attn.qkv.weight', 'model.audio_encoder.blocks_a.5.norm2.bias', 'model.audio_encoder.blocks_v.5.norm1_a.bias', 'model.audio_encoder.blocks_v.6.norm1.bias', 'model.audio_encoder.blocks_v.5.norm1_a.weight', 'model.audio_encoder.blocks_v.9.mlp.fc2.weight', 'model.audio_encoder.blocks_v.7.norm2.bias', 'model.audio_encoder.blocks_v.7.attn.qkv.weight'] - This IS expected if you are initializing LlamaForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). - This IS NOT expected if you are initializing LlamaForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). Some weights of LlamaForCausalLM were not initialized from the model checkpoint at ../../../pretrained_mdls/vicuna_ltu/ and are newly initialized: ['model.audio_encoder.time_tr.attn_ln.weight', 'model.audio_encoder.layer_tr.mlp.2.weight', 'model.audio_encoder.layer_tr.attn.out.bias', 'model.audio_encoder.time_tr.attn.query.bias', 'model.audio_encoder.time_tr.mlp_ln.bias', 'model.audio_encoder.time_tr.attn.query.weight', 'model.audio_encoder.layer_tr.mlp.0.weight', 'model.audio_encoder.layer_tr.mlp.2.bias', 'model.audio_encoder.time_tr.attn.out.weight', 'model.audio_encoder.time_tr.attn.key.weight', 'model.audio_encoder.time_tr.mlp_ln.weight', 'model.audio_proj.1.weight', 'model.audio_encoder.time_tr.attn.value.weight', 'model.audio_encoder.layer_tr.attn.query.bias', 'model.audio_encoder.layer_tr.attn_ln.bias', 'model.audio_encoder.layer_tr.mlp_ln.weight', 'model.audio_encoder.time_tr.mlp.2.weight', 'model.audio_encoder.layer_tr.attn.value.bias', 'model.audio_encoder.layer_tr.attn.out.weight', 'model.audio_encoder.time_tr.attn.out.bias', 'model.audio_encoder.time_tr.mlp.0.weight', 'model.audio_encoder.layer_tr.mlp.0.bias', 'model.audio_encoder.time_tr.mlp.0.bias', 'model.audio_encoder.time_tr.attn.value.bias', 'model.audio_encoder.layer_tr.attn_ln.weight', 'model.audio_encoder.layer_tr.attn.value.weight', 'model.audio_encoder.layer_tr.mlp_ln.bias', 'model.audio_encoder.layer_tr.attn.query.weight', 'model.audio_encoder.layer_tr.attn.key.weight', 'model.audio_proj.1.bias', 'model.audio_encoder.time_tr.mlp.2.bias', 'model.audio_encoder.time_tr.attn_ln.bias'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Traceback (most recent call last): File "ltu_main/src/ltu/train_script/../finetune_low_resource.py", line 285, in fire.Fire(train) File "conda/envs/venv_ltu/lib/python3.10/site-packages/fire/core.py", line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "conda/envs/venv_ltu/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire component, remaining_args = _CallAndUpdateTrace( File "conda/envs/venv_ltu/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File "code/forks/ltu_main/src/ltu/train_script/../finetune_low_resource.py", line 235, in train msg = model.load_state_dict(state_dict, strict=False) File "conda/envs/venv_ltu/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1671, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM: size mismatch for base_model.model.model.layers.0.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.0.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.0.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.0.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.1.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.1.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.1.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.1.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.2.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.2.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.2.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.2.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.3.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.3.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.3.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.3.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.4.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.4.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.4.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.4.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.5.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.5.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.5.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.5.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.6.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.6.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.6.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.6.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.7.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.7.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.7.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.7.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.8.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.8.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.8.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.8.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.9.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.9.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.9.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.9.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.10.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.10.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.10.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.10.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.11.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.11.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.11.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.11.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.12.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.12.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.12.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.12.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.13.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.13.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.13.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.13.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.14.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.14.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.14.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.14.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.15.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.15.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.15.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.15.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.16.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.16.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.16.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.16.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.17.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.17.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.17.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.17.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.18.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.18.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.18.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.18.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.19.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.19.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.19.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.19.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.20.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.20.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.20.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.20.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.21.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.21.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.21.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.21.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.22.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.22.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.22.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.22.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.23.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.23.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.23.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.23.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.24.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.24.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.24.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.24.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.25.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.25.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.25.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.25.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.26.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.26.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.26.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.26.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.27.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.27.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.27.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.27.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.28.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.28.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.28.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.28.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.29.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.29.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.29.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.29.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.30.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.30.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.30.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.30.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.31.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.31.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.31.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.31.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.audio_proj.1.weight: copying a param with shape torch.Size([5120, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1280]). size mismatch for base_model.model.model.audio_proj.1.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]). ```

YuanGongND / ltu

Issue with Loading 13B Model: Size Mismatch Error #26