YuanGongND / ltu

Code, Dataset, and Pretrained Models for Audio and Speech Large Language Model "Listen, Think, and Understand".
337 stars 27 forks source link

Issue with Loading 13B Model: Size Mismatch Error #26

Open EnisBerk opened 3 months ago

EnisBerk commented 3 months ago

Hi,

Thank you for the great work and the detailed documentation you have provided. It's been very helpful.

I'm trying to use the 13B model instead of the default 7B model. I downloaded the 13B model from the provided Dropbox folder and attempted to replace the modeling_llama.py file with the one from the Dropbox folder. However, it appears that they are identical to the default one in the repository.

When I try to load the 13B model, I get the following error message:

You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
<class 'peft.tuners.lora.LoraModel'>
Traceback (most recent call last):
  File "ltu_main/src/ltu/train_script/../finetune_low_resource.py", line 285, in <module>
    fire.Fire(train)
  File "conda/envs/venv_ltu/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "conda/envs/venv_ltu/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "conda/envs/venv_ltu/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "code/forks/ltu_main/src/ltu/train_script/../finetune_low_resource.py", line 235, in train
    msg = model.load_state_dict(state_dict, strict=False)
  File "conda/envs/venv_ltu/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1671, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM:
    size mismatch for base_model.model.model.layers.0.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]).

It seems like there is a size mismatch error when loading the state_dict for the PeftModelForCausalLM. The error message indicates that the shape of the parameter in the checkpoint is different from the shape in the current model.

Could you please guide me on how to correctly load the 13B model? Any help would be greatly appreciated.

Thank you.

Detailed logs:

``` Training Alpaca-LoRA model with params: base_model: ./LTU_13B/stage4_all_mix_long_seq/checkpoint-20000/pytorch_model.bin output_dir: ../exp/ltu_ft_toy_low_resource_2024-04-06_23-32-44/ batch_size: 256 micro_batch_size: 1 num_epochs: 1000 learning_rate: 0.0001 cutoff_len: 196 val_set_size: 0 lora_r: 8 lora_alpha: 16 lora_dropout: 0.05 lora_target_modules: ['q_proj', 'v_proj'] train_on_inputs: True add_eos_token: False group_by_length: True wandb_project: ltu wandb_run_name: ../exp/ltu_ft_toy_low_resource_2024-04-06_23-32-44/ wandb_watch: false wandb_log_model: false resume_from_checkpoint: False prompt template: alpaca_short Will load from ./LTU_13B/stage4_all_mix_long_seq/checkpoint-20000/pytorch_model.bin later, for implementation purpose, first load from ../../../pretrained_mdls/vicuna_ltu/ Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:12<00:00, 4.14s/it] Some weights of the model checkpoint at ../../../pretrained_mdls/vicuna_ltu/ were not used when initializing LlamaForCausalLM: ['model.audio_encoder.blocks_v.5.norm2.weight', 'model.audio_encoder.blocks_a.9.attn.qkv.weight', 'model.audio_encoder.blocks_a.3.norm2.bias', 'model.audio_encoder.blocks_v.6.norm1_a.bias', 'model.audio_encoder.mlp_head.1.weight', 'model.audio_encoder.blocks_a.1.attn.qkv.bias', 'model.audio_encoder.mlp_head.0.bias', 'model.audio_encoder.blocks_a.4.norm1_v.weight', 'model.audio_encoder.blocks_a.10.norm2_v.weight', 'model.audio_encoder.blocks_v.10.attn.proj.weight', 'model.audio_encoder.blocks_v.4.norm1.bias', 'model.audio_encoder.blocks_v.0.norm2.bias', 'model.audio_encoder.blocks_v.1.mlp.fc1.bias', 'model.audio_encoder.blocks_a.9.norm1_a.weight', 'model.audio_encoder.blocks_a.6.norm2.bias', 'model.audio_encoder.blocks_a.6.attn.proj.weight', 'model.audio_encoder.blocks_a.6.norm1_v.weight', 'model.audio_encoder.patch_embed_a.proj.weight', 'model.audio_encoder.blocks_a.7.norm2_a.bias', 'model.audio_encoder.blocks_v.1.mlp.fc2.weight', 'model.audio_encoder.blocks_a.5.norm2_a.weight', 'model.audio_encoder.blocks_v.6.attn.qkv.bias', 'model.audio_encoder.blocks_v.1.norm2_a.bias', 'model.audio_encoder.blocks_a.4.norm2_a.bias', 'model.audio_encoder.blocks_a.0.norm2_v.weight', 'model.audio_encoder.blocks_a.5.mlp.fc1.bias', 'model.audio_encoder.blocks_a.6.attn.qkv.bias', 'model.audio_encoder.blocks_a.6.norm2_v.bias', 'model.audio_encoder.blocks_v.8.norm2.weight', 'model.audio_encoder.blocks_v.1.attn.qkv.weight', 'model.audio_encoder.blocks_v.2.mlp.fc2.bias', 'model.audio_encoder.blocks_a.8.norm2.bias', 'model.audio_encoder.blocks_u.0.norm1_v.bias', 'model.audio_encoder.blocks_a.5.attn.qkv.bias', 'model.audio_encoder.blocks_v.10.attn.qkv.weight', 'model.audio_encoder.blocks_a.2.norm1_a.weight', 'model.audio_encoder.blocks_a.9.norm1.bias', 'model.audio_encoder.blocks_a.3.mlp.fc2.weight', 'model.audio_encoder.blocks_a.3.attn.proj.weight', 'model.audio_encoder.blocks_a.9.attn.proj.bias', 'model.audio_encoder.blocks_v.0.norm1.weight', 'model.audio_encoder.blocks_a.5.attn.proj.weight', 'model.audio_encoder.blocks_v.1.norm2_v.weight', 'model.audio_encoder.modality_a', 'model.audio_encoder.blocks_u.0.mlp.fc2.bias', 'model.audio_encoder.blocks_v.7.mlp.fc2.bias', 'model.audio_encoder.blocks_v.2.norm2.bias', 'model.audio_encoder.blocks_v.3.attn.proj.weight', 'model.audio_encoder.blocks_u.0.norm2_a.bias', 'model.audio_encoder.blocks_v.6.norm2.weight', 'model.audio_encoder.blocks_v.9.norm1_a.weight', 'model.audio_encoder.blocks_a.4.mlp.fc2.bias', 'model.audio_encoder.blocks_a.2.attn.proj.bias', 'model.audio_encoder.blocks_v.5.attn.qkv.weight', 'model.audio_encoder.blocks_u.0.attn.qkv.weight', 'model.audio_encoder.blocks_v.0.norm1.bias', 'model.audio_encoder.blocks_v.3.mlp.fc1.weight', 'model.audio_encoder.blocks_v.5.attn.proj.weight', 'model.audio_encoder.blocks_a.9.norm1_v.weight', 'model.audio_encoder.norm_a.bias', 'model.audio_encoder.blocks_v.5.attn.qkv.bias', 'model.audio_encoder.blocks_u.0.norm2.bias', 'model.audio_encoder.blocks_v.0.norm1_a.weight', 'model.audio_encoder.blocks_v.7.attn.qkv.bias', 'model.audio_encoder.blocks_a.8.norm1.weight', 'model.audio_encoder.blocks_a.1.norm1_v.bias', 'model.audio_encoder.blocks_a.8.norm1_v.bias', 'model.audio_encoder.blocks_v.5.attn.proj.bias', 'model.audio_encoder.norm_v.bias', 'model.audio_encoder.blocks_v.6.norm2_a.weight', 'model.audio_encoder.blocks_a.0.norm1_v.weight', 'model.audio_encoder.blocks_v.7.norm2_a.weight', 'model.audio_encoder.blocks_v.6.mlp.fc1.weight', 'model.audio_encoder.blocks_a.8.attn.proj.bias', 'model.audio_encoder.blocks_a.0.norm2_a.weight', 'model.audio_encoder.blocks_u.0.mlp.fc1.bias', 'model.audio_encoder.blocks_a.4.norm2_v.bias', 'model.audio_encoder.blocks_v.9.attn.proj.weight', 'model.audio_encoder.blocks_v.2.norm2_v.weight', 'model.audio_encoder.blocks_v.9.norm2.bias', 'model.audio_encoder.blocks_v.2.mlp.fc1.bias', 'model.audio_encoder.blocks_v.4.norm2_a.weight', 'model.audio_encoder.blocks_v.8.norm1_v.bias', 'model.audio_encoder.blocks_a.3.norm2_a.bias', 'model.audio_encoder.blocks_v.6.norm2.bias', 'model.audio_encoder.blocks_a.6.attn.proj.bias', 'model.audio_encoder.blocks_v.2.norm1_v.weight', 'model.audio_encoder.blocks_v.6.norm1.weight', 'model.audio_encoder.blocks_a.0.attn.qkv.weight', 'model.audio_encoder.blocks_u.0.norm1_v.weight', 'model.audio_encoder.blocks_v.8.mlp.fc2.bias', 'model.audio_encoder.blocks_v.8.attn.proj.weight', 'model.audio_encoder.blocks_a.1.attn.proj.bias', 'model.audio_encoder.blocks_a.8.mlp.fc1.bias', 'model.audio_encoder.blocks_a.10.norm2_a.weight', 'model.audio_encoder.blocks_v.1.attn.qkv.bias', 'model.audio_encoder.blocks_v.4.mlp.fc2.bias', 'model.audio_encoder.blocks_v.8.norm1.bias', 'model.audio_encoder.blocks_u.0.norm2_a.weight', 'model.audio_encoder.blocks_v.9.mlp.fc1.weight', 'model.audio_encoder.blocks_a.4.mlp.fc1.weight', 'model.audio_encoder.blocks_v.9.attn.qkv.bias', 'model.audio_encoder.blocks_v.1.norm2.weight', 'model.audio_encoder.blocks_v.0.mlp.fc2.bias', 'model.audio_encoder.blocks_a.4.attn.qkv.bias', 'model.audio_encoder.blocks_a.0.mlp.fc1.weight', 'model.audio_encoder.modality_v', 'model.audio_encoder.blocks_v.8.norm1.weight', 'model.audio_encoder.blocks_a.2.norm2_a.bias', 'model.audio_encoder.blocks_a.4.norm1.bias', 'model.audio_encoder.blocks_a.8.norm2_a.bias', 'model.audio_encoder.blocks_a.9.norm1_a.bias', 'model.audio_encoder.blocks_v.3.norm1.weight', 'model.audio_encoder.blocks_v.9.norm1_v.bias', 'model.audio_encoder.blocks_v.5.norm1_v.weight', 'model.audio_encoder.blocks_v.10.norm1_a.bias', 'model.audio_encoder.blocks_v.8.norm2_v.weight', 'model.audio_encoder.blocks_a.7.norm1.bias', 'model.audio_encoder.blocks_a.2.norm2_a.weight', 'model.audio_encoder.blocks_v.5.norm1.weight', 'model.audio_encoder.blocks_v.7.mlp.fc2.weight', 'model.audio_encoder.blocks_v.10.mlp.fc2.bias', 'model.audio_encoder.blocks_a.6.norm2_a.weight', 'model.audio_encoder.blocks_a.0.norm2_v.bias', 'model.audio_encoder.blocks_a.6.mlp.fc2.bias', 'model.audio_encoder.blocks_a.4.norm2_v.weight', 'model.audio_encoder.blocks_a.9.norm2.bias', 'model.audio_encoder.blocks_v.8.attn.qkv.bias', 'model.audio_encoder.blocks_a.1.norm2_v.bias', 'model.audio_encoder.blocks_v.2.attn.qkv.bias', 'model.audio_encoder.blocks_a.8.norm2_a.weight', 'model.audio_encoder.blocks_a.1.norm2_a.weight', 'model.audio_encoder.blocks_a.6.mlp.fc2.weight', 'model.audio_encoder.blocks_a.3.mlp.fc1.bias', 'model.audio_encoder.blocks_v.2.norm1.weight', 'model.audio_encoder.mlp_head.0.weight', 'model.audio_encoder.blocks_a.0.mlp.fc1.bias', 'model.audio_encoder.blocks_a.3.mlp.fc2.bias', 'model.audio_encoder.blocks_a.5.mlp.fc1.weight', 'model.audio_encoder.blocks_a.7.norm1_v.weight', 'model.audio_encoder.blocks_a.3.norm1.weight', 'model.audio_encoder.blocks_a.0.mlp.fc2.bias', 'model.audio_encoder.blocks_a.2.norm2_v.bias', 'model.audio_encoder.blocks_a.4.attn.proj.bias', 'model.audio_encoder.blocks_v.3.norm2_a.weight', 'model.audio_encoder.blocks_a.6.norm1.weight', 'model.audio_encoder.blocks_a.9.norm1_v.bias', 'model.audio_encoder.blocks_a.10.norm1_v.weight', 'model.audio_encoder.blocks_v.9.norm2_a.weight', 'model.audio_encoder.blocks_v.1.norm2_v.bias', 'model.audio_encoder.blocks_v.2.norm2_a.bias', 'model.audio_encoder.blocks_a.7.norm2.weight', 'model.audio_encoder.blocks_a.10.norm1_a.bias', 'model.audio_encoder.blocks_v.6.mlp.fc1.bias', 'model.audio_encoder.blocks_a.4.norm2_a.weight', 'model.audio_encoder.blocks_v.2.mlp.fc2.weight', 'model.audio_encoder.blocks_v.7.norm2_a.bias', 'model.audio_encoder.blocks_v.4.attn.qkv.bias', 'model.audio_encoder.blocks_v.0.norm1_v.bias', 'model.audio_encoder.blocks_a.1.attn.proj.weight', 'model.audio_encoder.blocks_a.1.norm1_a.weight', 'model.audio_encoder.blocks_u.0.attn.qkv.bias', 'model.audio_encoder.blocks_v.5.norm2.bias', 'model.audio_encoder.blocks_a.8.attn.qkv.weight', 'model.audio_encoder.blocks_a.5.attn.proj.bias', 'model.audio_encoder.blocks_v.1.norm1.bias', 'model.audio_encoder.blocks_a.2.attn.proj.weight', 'model.audio_encoder.blocks_a.1.norm1_v.weight', 'model.audio_encoder.blocks_a.9.norm2_v.weight', 'model.audio_encoder.blocks_a.7.mlp.fc2.weight', 'model.audio_encoder.blocks_a.4.norm1_a.weight', 'model.audio_encoder.blocks_a.4.mlp.fc1.bias', 'model.audio_encoder.blocks_v.8.mlp.fc1.bias', 'model.audio_encoder.blocks_a.10.norm2_a.bias', 'model.audio_encoder.blocks_a.10.norm1_a.weight', 'model.audio_encoder.blocks_v.4.attn.proj.bias', 'model.audio_encoder.blocks_v.1.norm1_a.bias', 'model.audio_encoder.blocks_v.10.attn.proj.bias', 'model.audio_encoder.blocks_v.8.mlp.fc2.weight', 'model.audio_encoder.blocks_v.2.norm2_a.weight', 'model.audio_encoder.norm.bias', 'model.audio_encoder.blocks_a.0.norm2_a.bias', 'model.audio_encoder.blocks_v.5.mlp.fc2.weight', 'model.audio_encoder.blocks_v.3.attn.qkv.weight', 'model.audio_encoder.blocks_v.0.attn.proj.weight', 'model.audio_encoder.blocks_v.4.norm1_a.bias', 'model.audio_encoder.blocks_a.0.attn.proj.weight', 'model.audio_encoder.blocks_a.0.norm2.weight', 'model.audio_encoder.blocks_v.1.mlp.fc2.bias', 'model.audio_encoder.blocks_v.2.norm1_a.bias', 'model.audio_encoder.blocks_v.3.norm1_a.weight', 'model.audio_encoder.blocks_a.2.mlp.fc1.bias', 'model.audio_encoder.blocks_v.8.norm2_v.bias', 'model.audio_encoder.blocks_a.7.norm1_a.weight', 'model.audio_encoder.blocks_a.9.norm1.weight', 'model.audio_encoder.blocks_a.2.norm1.weight', 'model.audio_encoder.blocks_v.10.norm1_v.weight', 'model.audio_encoder.blocks_a.3.norm1.bias', 'model.audio_encoder.blocks_a.4.norm1_v.bias', 'model.audio_encoder.blocks_v.3.norm1_v.weight', 'model.audio_encoder.blocks_v.3.mlp.fc2.bias', 'model.audio_encoder.blocks_a.7.mlp.fc1.bias', 'model.audio_encoder.blocks_a.5.norm2_v.bias', 'model.audio_encoder.blocks_v.2.attn.qkv.weight', 'model.audio_encoder.blocks_v.3.attn.qkv.bias', 'model.audio_encoder.blocks_a.2.attn.qkv.bias', 'model.audio_encoder.blocks_a.8.mlp.fc2.bias', 'model.audio_encoder.blocks_a.10.attn.qkv.weight', 'model.audio_encoder.blocks_a.2.norm2.bias', 'model.audio_encoder.blocks_v.9.norm2_v.bias', 'model.audio_encoder.blocks_a.5.norm2_a.bias', 'model.audio_encoder.blocks_v.10.attn.qkv.bias', 'model.audio_encoder.blocks_v.10.norm2_a.weight', 'model.audio_encoder.pos_embed_a', 'model.audio_encoder.blocks_a.4.mlp.fc2.weight', 'model.audio_encoder.blocks_v.9.attn.proj.bias', 'model.audio_encoder.blocks_v.6.mlp.fc2.bias', 'model.audio_encoder.blocks_a.2.norm2.weight', 'model.audio_encoder.blocks_v.4.norm1_a.weight', 'model.audio_encoder.blocks_a.6.norm2.weight', 'model.audio_encoder.blocks_a.8.norm1_v.weight', 'model.audio_encoder.blocks_v.6.norm2_v.weight', 'model.audio_encoder.blocks_v.0.norm1_a.bias', 'model.audio_encoder.blocks_a.5.norm1_v.weight', 'model.audio_encoder.blocks_v.4.mlp.fc2.weight', 'model.audio_encoder.pos_embed_v', 'model.audio_encoder.blocks_a.7.attn.qkv.bias', 'model.audio_encoder.blocks_a.0.attn.proj.bias', 'model.audio_encoder.blocks_v.9.norm2.weight', 'model.audio_encoder.blocks_a.8.norm2_v.bias', 'model.audio_encoder.blocks_v.4.norm1_v.bias', 'model.audio_encoder.blocks_a.9.mlp.fc2.bias', 'model.audio_encoder.blocks_v.3.mlp.fc2.weight', 'model.audio_encoder.blocks_v.8.norm2_a.weight', 'model.audio_encoder.blocks_a.9.attn.proj.weight', 'model.audio_encoder.blocks_v.5.norm1_v.bias', 'model.audio_encoder.blocks_v.7.mlp.fc1.bias', 'model.audio_encoder.blocks_v.0.attn.proj.bias', 'model.audio_encoder.blocks_a.3.norm1_v.bias', 'model.audio_encoder.blocks_a.5.norm2.weight', 'model.audio_encoder.blocks_a.6.mlp.fc1.weight', 'model.audio_encoder.blocks_u.0.mlp.fc1.weight', 'model.audio_encoder.blocks_v.10.norm2_a.bias', 'model.audio_encoder.blocks_v.9.norm2_a.bias', 'model.audio_encoder.blocks_a.3.norm2_a.weight', 'model.audio_encoder.blocks_a.9.mlp.fc2.weight', 'model.audio_encoder.blocks_a.5.mlp.fc2.bias', 'model.audio_encoder.blocks_a.6.norm2_v.weight', 'model.audio_encoder.blocks_v.3.attn.proj.bias', 'model.audio_encoder.blocks_a.3.norm2_v.bias', 'model.audio_encoder.blocks_v.5.mlp.fc1.weight', 'model.audio_encoder.blocks_a.0.mlp.fc2.weight', 'model.audio_encoder.blocks_v.10.norm2.weight', 'model.audio_encoder.blocks_v.7.norm1_v.bias', 'model.audio_encoder.blocks_v.7.norm2.weight', 'model.audio_encoder.blocks_a.10.attn.proj.bias', 'model.audio_encoder.blocks_v.6.attn.proj.bias', 'model.audio_encoder.blocks_a.8.attn.qkv.bias', 'model.audio_encoder.blocks_v.1.attn.proj.weight', 'model.audio_encoder.blocks_u.0.norm2_v.bias', 'model.audio_encoder.blocks_v.8.norm1_v.weight', 'model.audio_encoder.blocks_a.9.mlp.fc1.weight', 'model.audio_encoder.blocks_v.1.norm2_a.weight', 'model.audio_encoder.blocks_a.10.mlp.fc1.weight', 'model.audio_encoder.blocks_a.7.norm1.weight', 'model.audio_encoder.blocks_v.6.norm1_v.weight', 'model.audio_encoder.blocks_v.5.norm1.bias', 'model.audio_encoder.blocks_a.7.attn.proj.weight', 'model.audio_encoder.blocks_a.10.norm1.weight', 'model.audio_encoder.blocks_v.0.norm2_a.weight', 'model.audio_encoder.blocks_v.9.norm1_a.bias', 'model.audio_encoder.blocks_v.8.attn.proj.bias', 'model.audio_encoder.blocks_a.5.norm2_v.weight', 'model.audio_encoder.blocks_u.0.norm1.weight', 'model.audio_encoder.blocks_a.2.mlp.fc2.bias', 'model.audio_encoder.blocks_a.1.norm2.bias', 'model.audio_encoder.blocks_v.0.mlp.fc2.weight', 'model.audio_encoder.blocks_a.4.attn.qkv.weight', 'model.audio_encoder.blocks_u.0.norm1_a.weight', 'model.audio_encoder.blocks_a.1.mlp.fc2.weight', 'model.audio_encoder.blocks_a.5.norm1_v.bias', 'model.audio_encoder.blocks_a.4.norm2.weight', 'model.audio_encoder.blocks_v.7.mlp.fc1.weight', 'model.audio_encoder.blocks_a.7.attn.proj.bias', 'model.audio_encoder.blocks_v.0.mlp.fc1.bias', 'model.audio_encoder.blocks_a.2.norm1_v.bias', 'model.audio_encoder.blocks_v.4.mlp.fc1.bias', 'model.audio_encoder.blocks_a.1.norm1_a.bias', 'model.audio_encoder.norm_a.weight', 'model.audio_encoder.blocks_a.0.norm1_a.bias', 'model.audio_encoder.blocks_a.5.attn.qkv.weight', 'model.audio_encoder.blocks_v.2.norm2_v.bias', 'model.audio_encoder.blocks_a.2.mlp.fc1.weight', 'model.audio_encoder.blocks_v.10.norm1_a.weight', 'model.audio_encoder.patch_embed_v.proj.bias', 'model.audio_encoder.blocks_a.5.norm1_a.weight', 'model.audio_encoder.blocks_u.0.norm2_v.weight', 'model.audio_encoder.blocks_v.3.norm2_a.bias', 'model.audio_encoder.blocks_v.6.norm1_a.weight', 'model.audio_encoder.blocks_a.8.attn.proj.weight', 'model.audio_encoder.blocks_a.6.norm1_a.weight', 'model.audio_encoder.blocks_v.10.mlp.fc1.weight', 'model.audio_encoder.blocks_v.9.mlp.fc2.bias', 'model.audio_encoder.blocks_v.0.norm2_a.bias', 'model.audio_encoder.blocks_a.0.norm1_a.weight', 'model.audio_encoder.blocks_a.7.norm2.bias', 'model.audio_encoder.blocks_v.10.norm2.bias', 'model.audio_encoder.blocks_a.8.mlp.fc1.weight', 'model.audio_encoder.blocks_v.3.norm2_v.bias', 'model.audio_encoder.blocks_a.6.norm1_a.bias', 'model.audio_encoder.blocks_a.8.norm1_a.weight', 'model.audio_encoder.blocks_a.0.norm2.bias', 'model.audio_encoder.blocks_a.6.mlp.fc1.bias', 'model.audio_encoder.blocks_a.1.norm2.weight', 'model.audio_encoder.blocks_a.6.attn.qkv.weight', 'model.audio_encoder.blocks_v.1.norm2.bias', 'model.audio_encoder.blocks_v.0.attn.qkv.bias', 'model.audio_encoder.blocks_v.8.norm1_a.weight', 'model.audio_encoder.blocks_v.7.norm2_v.bias', 'model.audio_encoder.blocks_a.2.norm1_v.weight', 'model.audio_encoder.blocks_a.3.norm1_a.weight', 'model.audio_encoder.blocks_a.10.norm2.bias', 'model.audio_encoder.blocks_a.7.mlp.fc2.bias', 'model.audio_encoder.mlp_head.1.bias', 'model.audio_encoder.blocks_v.4.norm2_v.weight', 'model.audio_encoder.blocks_a.1.attn.qkv.weight', 'model.audio_encoder.blocks_a.9.mlp.fc1.bias', 'model.audio_encoder.blocks_a.6.norm1.bias', 'model.audio_encoder.blocks_v.9.mlp.fc1.bias', 'model.audio_encoder.patch_embed_a.proj.bias', 'model.audio_encoder.patch_embed_v.proj.weight', 'model.audio_encoder.blocks_a.2.mlp.fc2.weight', 'model.audio_encoder.blocks_a.4.attn.proj.weight', 'model.audio_encoder.blocks_v.5.norm2_v.weight', 'model.audio_encoder.blocks_a.6.norm1_v.bias', 'model.audio_encoder.blocks_v.10.norm1.weight', 'model.audio_encoder.blocks_u.0.norm1_a.bias', 'model.audio_encoder.blocks_v.4.norm1.weight', 'model.audio_encoder.blocks_a.0.norm1_v.bias', 'model.audio_encoder.blocks_v.7.attn.proj.weight', 'model.audio_encoder.blocks_v.4.norm1_v.weight', 'model.audio_encoder.blocks_v.7.attn.proj.bias', 'model.audio_encoder.blocks_a.1.norm2_a.bias', 'model.audio_encoder.blocks_a.7.norm2_v.bias', 'model.audio_encoder.blocks_v.3.norm2.weight', 'model.audio_encoder.blocks_v.4.norm2_a.bias', 'model.audio_encoder.blocks_v.2.mlp.fc1.weight', 'model.audio_encoder.blocks_a.3.norm2.weight', 'model.audio_encoder.blocks_v.1.norm1_v.bias', 'model.audio_encoder.blocks_v.2.attn.proj.bias', 'model.audio_encoder.blocks_v.8.norm2.bias', 'model.audio_encoder.blocks_u.0.norm2.weight', 'model.audio_encoder.blocks_v.10.norm1.bias', 'model.audio_encoder.blocks_a.10.mlp.fc2.bias', 'model.audio_encoder.blocks_a.5.norm1.weight', 'model.audio_encoder.blocks_v.9.norm2_v.weight', 'model.audio_encoder.blocks_v.4.norm2_v.bias', 'model.audio_encoder.blocks_v.7.norm1.bias', 'model.audio_encoder.blocks_v.0.attn.qkv.weight', 'model.audio_encoder.blocks_a.2.norm1.bias', 'model.audio_encoder.blocks_v.0.norm2_v.weight', 'model.audio_encoder.blocks_a.4.norm2.bias', 'model.audio_encoder.blocks_a.6.norm2_a.bias', 'model.audio_encoder.blocks_v.2.norm1.bias', 'model.audio_encoder.blocks_a.9.norm2_a.weight', 'model.audio_encoder.blocks_a.10.norm2.weight', 'model.audio_encoder.blocks_v.7.norm1_a.weight', 'model.audio_encoder.blocks_v.6.attn.qkv.weight', 'model.audio_encoder.blocks_a.10.mlp.fc1.bias', 'model.audio_encoder.blocks_v.0.norm2.weight', 'model.audio_encoder.blocks_v.6.norm2_v.bias', 'model.audio_encoder.blocks_u.0.norm1.bias', 'model.audio_encoder.blocks_a.9.attn.qkv.bias', 'model.audio_encoder.blocks_v.4.attn.qkv.weight', 'model.audio_encoder.blocks_v.5.mlp.fc1.bias', 'model.audio_encoder.blocks_a.4.norm1_a.bias', 'model.audio_encoder.blocks_a.2.norm2_v.weight', 'model.audio_encoder.blocks_v.3.mlp.fc1.bias', 'model.audio_encoder.blocks_v.6.norm2_a.bias', 'model.audio_encoder.blocks_v.4.norm2.weight', 'model.audio_encoder.blocks_v.10.mlp.fc2.weight', 'model.audio_encoder.blocks_a.1.norm1.weight', 'model.audio_encoder.blocks_v.3.norm1.bias', 'model.audio_encoder.blocks_v.8.norm1_a.bias', 'model.audio_encoder.blocks_a.8.norm1.bias', 'model.audio_encoder.blocks_a.0.norm1.weight', 'model.audio_encoder.blocks_a.8.norm1_a.bias', 'model.audio_encoder.blocks_a.10.attn.qkv.bias', 'model.audio_encoder.blocks_u.0.attn.proj.bias', 'model.audio_encoder.blocks_v.7.norm1_v.weight', 'model.audio_encoder.blocks_v.3.norm1_v.bias', 'model.audio_encoder.blocks_a.8.norm2_v.weight', 'model.audio_encoder.blocks_a.7.norm1_a.bias', 'model.audio_encoder.blocks_v.1.norm1_a.weight', 'model.audio_encoder.blocks_v.3.norm2.bias', 'model.audio_encoder.blocks_a.3.attn.qkv.bias', 'model.audio_encoder.blocks_a.10.norm2_v.bias', 'model.audio_encoder.blocks_v.10.mlp.fc1.bias', 'model.audio_encoder.blocks_a.8.norm2.weight', 'model.audio_encoder.blocks_a.10.norm1.bias', 'model.audio_encoder.blocks_v.9.norm1_v.weight', 'model.audio_encoder.blocks_a.3.norm2_v.weight', 'model.audio_encoder.blocks_v.7.norm1_a.bias', 'model.audio_encoder.blocks_a.0.attn.qkv.bias', 'model.audio_encoder.blocks_a.3.attn.qkv.weight', 'model.audio_encoder.blocks_a.1.mlp.fc2.bias', 'model.audio_encoder.norm_v.weight', 'model.audio_encoder.blocks_v.3.norm2_v.weight', 'model.audio_encoder.blocks_v.5.norm2_v.bias', 'model.audio_encoder.blocks_v.9.norm1.bias', 'model.audio_encoder.blocks_a.9.norm2_a.bias', 'model.audio_encoder.blocks_a.5.mlp.fc2.weight', 'model.audio_encoder.blocks_a.7.norm2_a.weight', 'model.audio_encoder.blocks_a.5.norm1.bias', 'model.audio_encoder.blocks_v.3.norm1_a.bias', 'model.audio_encoder.blocks_v.6.norm1_v.bias', 'model.audio_encoder.blocks_v.7.norm2_v.weight', 'model.audio_encoder.blocks_v.0.norm1_v.weight', 'model.audio_encoder.blocks_a.8.mlp.fc2.weight', 'model.audio_encoder.blocks_v.0.mlp.fc1.weight', 'model.audio_encoder.blocks_v.5.norm2_a.weight', 'model.audio_encoder.blocks_a.7.norm2_v.weight', 'model.audio_encoder.blocks_v.10.norm2_v.weight', 'model.audio_encoder.blocks_v.2.attn.proj.weight', 'model.audio_encoder.blocks_v.0.norm2_v.bias', 'model.audio_encoder.blocks_a.4.norm1.weight', 'model.audio_encoder.blocks_v.7.norm1.weight', 'model.audio_encoder.blocks_v.4.attn.proj.weight', 'model.audio_encoder.blocks_a.3.attn.proj.bias', 'model.audio_encoder.blocks_a.2.norm1_a.bias', 'model.audio_encoder.blocks_v.4.norm2.bias', 'model.audio_encoder.blocks_v.2.norm1_a.weight', 'model.audio_encoder.blocks_v.2.norm2.weight', 'model.audio_encoder.blocks_a.2.attn.qkv.weight', 'model.audio_encoder.blocks_v.6.mlp.fc2.weight', 'model.audio_encoder.blocks_v.8.attn.qkv.weight', 'model.audio_encoder.blocks_a.9.norm2_v.bias', 'model.audio_encoder.blocks_v.10.norm1_v.bias', 'model.audio_encoder.blocks_a.0.norm1.bias', 'model.audio_encoder.blocks_v.10.norm2_v.bias', 'model.audio_encoder.blocks_v.1.mlp.fc1.weight', 'model.audio_encoder.blocks_v.5.mlp.fc2.bias', 'model.audio_encoder.blocks_a.3.mlp.fc1.weight', 'model.audio_encoder.blocks_a.1.mlp.fc1.bias', 'model.audio_encoder.blocks_a.10.mlp.fc2.weight', 'model.audio_encoder.blocks_a.9.norm2.weight', 'model.audio_encoder.blocks_a.1.mlp.fc1.weight', 'model.audio_encoder.blocks_a.3.norm1_v.weight', 'model.audio_encoder.blocks_v.8.mlp.fc1.weight', 'model.audio_encoder.blocks_v.1.attn.proj.bias', 'model.audio_encoder.blocks_u.0.mlp.fc2.weight', 'model.audio_encoder.blocks_a.1.norm2_v.weight', 'model.audio_encoder.blocks_a.10.norm1_v.bias', 'model.audio_encoder.blocks_v.5.norm2_a.bias', 'model.audio_encoder.blocks_v.2.norm1_v.bias', 'model.audio_encoder.blocks_v.8.norm2_a.bias', 'model.audio_encoder.blocks_u.0.attn.proj.weight', 'model.audio_encoder.blocks_a.7.mlp.fc1.weight', 'model.audio_encoder.blocks_a.1.norm1.bias', 'model.audio_encoder.blocks_v.9.norm1.weight', 'model.audio_encoder.blocks_a.7.attn.qkv.weight', 'model.audio_encoder.blocks_a.5.norm1_a.bias', 'model.audio_encoder.norm.weight', 'model.audio_encoder.blocks_a.3.norm1_a.bias', 'model.audio_encoder.blocks_v.1.norm1.weight', 'model.audio_encoder.blocks_v.6.attn.proj.weight', 'model.audio_encoder.blocks_v.1.norm1_v.weight', 'model.audio_encoder.blocks_v.4.mlp.fc1.weight', 'model.audio_encoder.blocks_a.7.norm1_v.bias', 'model.audio_encoder.blocks_a.10.attn.proj.weight', 'model.audio_encoder.blocks_v.9.attn.qkv.weight', 'model.audio_encoder.blocks_a.5.norm2.bias', 'model.audio_encoder.blocks_v.5.norm1_a.bias', 'model.audio_encoder.blocks_v.6.norm1.bias', 'model.audio_encoder.blocks_v.5.norm1_a.weight', 'model.audio_encoder.blocks_v.9.mlp.fc2.weight', 'model.audio_encoder.blocks_v.7.norm2.bias', 'model.audio_encoder.blocks_v.7.attn.qkv.weight'] - This IS expected if you are initializing LlamaForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model). - This IS NOT expected if you are initializing LlamaForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). Some weights of LlamaForCausalLM were not initialized from the model checkpoint at ../../../pretrained_mdls/vicuna_ltu/ and are newly initialized: ['model.audio_encoder.time_tr.attn_ln.weight', 'model.audio_encoder.layer_tr.mlp.2.weight', 'model.audio_encoder.layer_tr.attn.out.bias', 'model.audio_encoder.time_tr.attn.query.bias', 'model.audio_encoder.time_tr.mlp_ln.bias', 'model.audio_encoder.time_tr.attn.query.weight', 'model.audio_encoder.layer_tr.mlp.0.weight', 'model.audio_encoder.layer_tr.mlp.2.bias', 'model.audio_encoder.time_tr.attn.out.weight', 'model.audio_encoder.time_tr.attn.key.weight', 'model.audio_encoder.time_tr.mlp_ln.weight', 'model.audio_proj.1.weight', 'model.audio_encoder.time_tr.attn.value.weight', 'model.audio_encoder.layer_tr.attn.query.bias', 'model.audio_encoder.layer_tr.attn_ln.bias', 'model.audio_encoder.layer_tr.mlp_ln.weight', 'model.audio_encoder.time_tr.mlp.2.weight', 'model.audio_encoder.layer_tr.attn.value.bias', 'model.audio_encoder.layer_tr.attn.out.weight', 'model.audio_encoder.time_tr.attn.out.bias', 'model.audio_encoder.time_tr.mlp.0.weight', 'model.audio_encoder.layer_tr.mlp.0.bias', 'model.audio_encoder.time_tr.mlp.0.bias', 'model.audio_encoder.time_tr.attn.value.bias', 'model.audio_encoder.layer_tr.attn_ln.weight', 'model.audio_encoder.layer_tr.attn.value.weight', 'model.audio_encoder.layer_tr.mlp_ln.bias', 'model.audio_encoder.layer_tr.attn.query.weight', 'model.audio_encoder.layer_tr.attn.key.weight', 'model.audio_proj.1.bias', 'model.audio_encoder.time_tr.mlp.2.bias', 'model.audio_encoder.time_tr.attn_ln.bias'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Traceback (most recent call last): File "ltu_main/src/ltu/train_script/../finetune_low_resource.py", line 285, in fire.Fire(train) File "conda/envs/venv_ltu/lib/python3.10/site-packages/fire/core.py", line 141, in Fire component_trace = _Fire(component, args, parsed_flag_args, context, name) File "conda/envs/venv_ltu/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire component, remaining_args = _CallAndUpdateTrace( File "conda/envs/venv_ltu/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace component = fn(*varargs, **kwargs) File "code/forks/ltu_main/src/ltu/train_script/../finetune_low_resource.py", line 235, in train msg = model.load_state_dict(state_dict, strict=False) File "conda/envs/venv_ltu/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1671, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM: size mismatch for base_model.model.model.layers.0.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.0.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.0.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.0.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.1.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.1.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.1.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.1.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.2.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.2.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.2.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.2.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.3.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.3.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.3.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.3.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.4.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.4.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.4.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.4.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.5.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.5.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.5.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.5.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.6.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.6.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.6.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.6.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.7.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.7.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.7.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.7.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.8.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.8.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.8.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.8.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.9.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.9.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.9.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.9.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.10.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.10.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.10.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.10.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.11.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.11.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.11.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.11.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.12.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.12.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.12.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.12.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.13.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.13.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.13.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.13.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.14.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.14.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.14.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.14.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.15.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.15.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.15.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.15.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.16.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.16.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.16.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.16.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.17.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.17.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.17.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.17.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.18.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.18.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.18.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.18.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.19.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.19.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.19.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.19.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.20.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.20.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.20.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.20.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.21.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.21.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.21.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.21.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.22.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.22.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.22.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.22.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.23.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.23.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.23.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.23.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.24.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.24.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.24.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.24.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.25.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.25.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.25.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.25.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.26.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.26.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.26.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.26.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.27.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.27.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.27.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.27.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.28.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.28.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.28.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.28.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.29.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.29.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.29.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.29.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.30.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.30.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.30.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.30.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.31.self_attn.q_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.31.self_attn.q_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.layers.31.self_attn.v_proj.lora_A.default.weight: copying a param with shape torch.Size([8, 5120]) from checkpoint, the shape in current model is torch.Size([8, 4096]). size mismatch for base_model.model.model.layers.31.self_attn.v_proj.lora_B.default.weight: copying a param with shape torch.Size([5120, 8]) from checkpoint, the shape in current model is torch.Size([4096, 8]). size mismatch for base_model.model.model.audio_proj.1.weight: copying a param with shape torch.Size([5120, 768]) from checkpoint, the shape in current model is torch.Size([4096, 1280]). size mismatch for base_model.model.model.audio_proj.1.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([4096]). ```

YuanGongND commented 3 months ago

Thanks for reporting the bug.

Unfortunately I do not have time to fix this in near future, you might need to do it by yourself.

-Yuan

EnisBerk commented 3 months ago

Thank you for your quick response and for letting me know about the timeline.

As I attempt to resolve this issue myself, I was wondering if you might have any pointers or specific things to pay attention to from your experience when you first got the 13B model to work. Any insights or guidance would be greatly appreciated, but I completely understand if you don't have the time to respond right now.

Best, Enis

EnisBerk commented 2 months ago

It turns out that the problem was with the original vicuna checkpoint, which was for the 7B version. I replaced it with the 13B version, and it worked.

YuanGongND commented 2 months ago

@EnisBerk thanks so much for letting me know!