Closed M3Dade closed 9 months ago
However, when I modified line 816 if 'phi' or 'imp' in model_args.model_name_or_path:
in imp_llava.train.train.py.
The .sh could work.
2024-02-22 20:22:46,314] [WARNING] [partition_parameters.py:836:_post_init_method] param `probe` in SiglipMultiheadAttentionPoolingHead not on GPU so was not broadcasted from rank 0
[2024-02-22 20:22:47,409] [INFO] [partition_parameters.py:453:__exit__] finished initializing model with 6.43B parameters
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:04<00:00, 1.56it/s]
[2024-02-22 20:22:52] [INFO] [./imp_llava/train/train.py:184] lora_module_names: ['out_proj', 'fc1', 'fc2', 'Wqkv', 'linear']
[2024-02-22 20:23:32] [INFO] [./imp_llava/train/train.py:965] unfreezing wte.weight
Parameter Offload: Total persistent parameters: 1296288 in 459 params
{'loss': 2.6082, 'learning_rate': 2.5974025974025976e-06, 'epoch': 0.0}
What's more, I found in this way the parameters is 6.43B less than 7.77B which is mentioned above.
I only changed
but have this infomation in my terminal