finetune_lora_custom.sh

I only changed

IMP_MODEL='./checkpoints/imp-v1-3b'

--data_path 
--image_folder

but have this infomation in my terminal

You are using a model of type imp to instantiate a model of type llava. This is not supported for all configurations of models and can yield errors.
You are using a model of type imp to instantiate a model of type llava. This is not supported for all configurations of models and can yield errors.
You are using a model of type imp to instantiate a model of type llava. This is not supported for all configurations of models and can yield errors.
You are using a model of type imp to instantiate a model of type llava. This is not supported for all configurations of models and can yield errors.
Downloading config.json: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 576/576 [00:00<00:00, 1.89MB/s]
[2024-02-22 16:33:49,885] [WARNING] [partition_parameters.py:836:_post_init_method] param `probe` in SiglipMultiheadAttentionPoolingHead not on GPU so was not broadcasted from rank 0
[2024-02-22 16:33:53,686] [INFO] [partition_parameters.py:453:__exit__] finished initializing model with 7.77B parameters
Traceback (most recent call last):
  File "/data1/*** /imp/imp_llava/train/train_mem.py", line 15, in <module>
    train()
  File "/data1/***/imp/./imp_llava/train/train.py", line 827, in train
    model = LlavaLlamaForCausalLM.from_pretrained(
  File "/data1/***/site-packages/transformers/modeling_utils.py", line 2903, in from_pretrained
    ) = cls._load_pretrained_model(
  File "/data1/***/site-packages/transformers/modeling_utils.py", line 3125, in _load_pretrained_model
    model.apply(model._initialize_weights)
  File "/data1/***/site-packages/torch/nn/modules/module.py", line 884, in apply
    module.apply(fn)
  File "/data1/***/site-packages/torch/nn/modules/module.py", line 884, in apply
    module.apply(fn)
  File "/data1/***/site-packages/torch/nn/modules/module.py", line 885, in apply
    fn(self)
  File "/data1/***/site-packages/transformers/modeling_utils.py", line 1261, in _initialize_weights
    self._init_weights(module)
  File "/data1/***/site-packages/transformers/models/llama/modeling_llama.py", line 472, in _init_weights
    module.weight.data[module.padding_idx].zero_()
IndexError: index 50256 is out of bounds for dimension 0 with size 0
[2024-02-22 16:33:55,511] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 2275311
[2024-02-22 16:33:55,524] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 2275312
[2024-02-22 16:33:55,535] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 2275313
[2024-02-22 16:33:55,545] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 2275314

However, when I modified line 816 if 'phi' or 'imp' in model_args.model_name_or_path: in imp_llava.train.train.py. The .sh could work.

2024-02-22 20:22:46,314] [WARNING] [partition_parameters.py:836:_post_init_method] param `probe` in SiglipMultiheadAttentionPoolingHead not on GPU so was not broadcasted from rank 0
[2024-02-22 20:22:47,409] [INFO] [partition_parameters.py:453:__exit__] finished initializing model with 6.43B parameters
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:04<00:00,  1.56it/s]
[2024-02-22 20:22:52] [INFO] [./imp_llava/train/train.py:184] lora_module_names: ['out_proj', 'fc1', 'fc2', 'Wqkv', 'linear']
[2024-02-22 20:23:32] [INFO] [./imp_llava/train/train.py:965] unfreezing wte.weight
Parameter Offload: Total persistent parameters: 1296288 in 459 params
{'loss': 2.6082, 'learning_rate': 2.5974025974025976e-06, 'epoch': 0.0}

What's more, I found in this way the parameters is 6.43B less than 7.77B which is mentioned above.

MILVLG / imp

finetune_lora_custom.sh #11