Open BadUncleBoy opened 8 months ago
Have you made any modifications to the original released code? If so, please provide more information about what you modified. If not, please give your startup command like here. Thanks.
I meets the errors when funtune using loar. ValueError: Target module LlamaDecoderLayer( (self_attn): LlamaAttention( (q_proj): Linear(in_features=2048, out_features=2048, bias=False) (k_proj): Linear(in_features=2048, out_features=2048, bias=False) (v_proj): Linear(in_features=2048, out_features=2048, bias=False) (o_proj): Linear(in_features=2048, out_features=2048, bias=False) (rotary_emb): LlamaRotaryEmbedding() ) (mlp): LlamaMLP( (gate_proj): Linear(in_features=2048, out_features=5632, bias=False) (up_proj): Linear(in_features=2048, out_features=5632, bias=False) (down_proj): Linear(in_features=5632, out_features=2048, bias=False) (act_fn): SiLUActivation() ) (input_layernorm): LlamaRMSNorm() (post_attention_layernorm): LlamaRMSNorm() ) is not supported. Currently, only
torch.nn.Linear
andConv1D
are supported.
I met the same problem. The command is not modified from the case "finetune.lora" in run.sh.
I meets the errors when funtune using loar. ValueError: Target module LlamaDecoderLayer( (self_attn): LlamaAttention( (q_proj): Linear(in_features=2048, out_features=2048, bias=False) (k_proj): Linear(in_features=2048, out_features=2048, bias=False) (v_proj): Linear(in_features=2048, out_features=2048, bias=False) (o_proj): Linear(in_features=2048, out_features=2048, bias=False) (rotary_emb): LlamaRotaryEmbedding() ) (mlp): LlamaMLP( (gate_proj): Linear(in_features=2048, out_features=5632, bias=False) (up_proj): Linear(in_features=2048, out_features=5632, bias=False) (down_proj): Linear(in_features=5632, out_features=2048, bias=False) (act_fn): SiLUActivation() ) (input_layernorm): LlamaRMSNorm() (post_attention_layernorm): LlamaRMSNorm() ) is not supported. Currently, only
torch.nn.Linear
andConv1D
are supported.
Could you provide more information about this error, context logs for example.
Traceback (most recent call last):
File "/hy-tmp/MobileVLM-main/mobilevlm/train/train_mem.py", line 13, in torch.nn.Linear
and Conv1D
are supported.
[2024-02-27 04:18:03,745] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 7582
[2024-02-27 04:18:03,746] [ERROR] [launch.py:321:sigkill_handler] ['/usr/local/miniconda3/envs/mobilevlm/bin/python', '-u', 'mobilevlm/train/train_mem.py', '--local_rank=0', '--deepspeed', 'scripts/deepspeed/zero2.json', '--lora_enable', 'True', '--lora_r', '128', '--lora_alpha', '256', '--learning_rate', '2e-4', '--model_name_or_path', './mtgv/MobileVLM_V2-1.7B', '--version', 'v1', '--data_path', 'data/eccv_train.json', '--image_folder', 'data/eccv_train', '--vision_tower', './mtgv/clip-vit-large-patch14-336', '--vision_tower_type', 'clip', '--pretrain_mm_mlp_adapter', './finetune-results/mobilevlm-1.pretrain/mm_projector.bin', '--mm_projector_type', 'ldpnet', '--mm_vision_select_layer', '-2', '--mm_use_im_start_end', 'False', '--mm_use_im_patch_token', 'False', '--image_aspect_ratio', 'pad', '--group_by_modality_length', 'True', '--bf16', 'True', '--output_dir', './finetune-results/mobilevlm-2.finetune-lora', '--num_train_epochs', '1', '--per_device_train_batch_size', '16', '--per_device_eval_batch_size', '4', '--gradient_accumulation_steps', '1', '--evaluation_strategy', 'no', '--save_strategy', 'steps', '--save_steps', '50000', '--save_total_limit', '1', '--weight_decay', '0.', '--warmup_ratio', '0.03', '--lr_scheduler_type', 'cosine', '--logging_steps', '1', '--tf32', 'True', '--model_max_length', '2048', '--gradient_checkpointing', 'True', '--dataloader_num_workers', '4', '--lazy_preprocess', 'True', '--report_to', 'none'] exits with return code = 1
[2024-02-27 04:18:07,883] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Traceback (most recent call last):
File "/hy-tmp/MobileVLM-main/scripts/mergelora.py", line 51, in
Traceback (most recent call last): File "/hy-tmp/MobileVLM-main/mobilevlm/train/train_mem.py", line 13, in train() File "/hy-tmp/MobileVLM-main/mobilevlm/train/train.py", line 807, in train model = get_peft_model(model, lora_config) File "/usr/local/miniconda3/envs/mobilevlm/lib/python3.10/site-packages/peft/mapping.py", line 98, in get_peft_model return MODEL_TYPE_TO_PEFT_MODEL_MAPPING[peft_config.task_type](model, peft_config, adapter_name=adapter_name) File "/usr/local/miniconda3/envs/mobilevlm/lib/python3.10/site-packages/peft/peft_model.py", line 893, in init super().init(model, peft_config, adapter_name) File "/usr/local/miniconda3/envs/mobilevlm/lib/python3.10/site-packages/peft/peft_model.py", line 112, in init self.base_model = PEFT_TYPE_TO_MODEL_MAPPING[peft_config.peft_type]( File "/usr/local/miniconda3/envs/mobilevlm/lib/python3.10/site-packages/peft/tuners/lora.py", line 180, in init self.add_adapter(adapter_name, self.peft_config[adapter_name]) File "/usr/local/miniconda3/envs/mobilevlm/lib/python3.10/site-packages/peft/tuners/lora.py", line 194, in add_adapter self._find_and_replace(adapter_name) File "/usr/local/miniconda3/envs/mobilevlm/lib/python3.10/site-packages/peft/tuners/lora.py", line 352, in _find_and_replace new_module = self._create_new_module(lora_config, adapter_name, target) File "/usr/local/miniconda3/envs/mobilevlm/lib/python3.10/site-packages/peft/tuners/lora.py", line 305, in _create_new_module raise ValueError( ValueError: Target module LlamaDecoderLayer( (self_attn): LlamaAttention( (q_proj): Linear(in_features=2048, out_features=2048, bias=False) (k_proj): Linear(in_features=2048, out_features=2048, bias=False) (v_proj): Linear(in_features=2048, out_features=2048, bias=False) (o_proj): Linear(in_features=2048, out_features=2048, bias=False) (rotary_emb): LlamaRotaryEmbedding() ) (mlp): LlamaMLP( (gate_proj): Linear(in_features=2048, out_features=5632, bias=False) (up_proj): Linear(in_features=2048, out_features=5632, bias=False) (down_proj): Linear(in_features=5632, out_features=2048, bias=False) (act_fn): SiLUActivation() ) (input_layernorm): LlamaRMSNorm() (post_attention_layernorm): LlamaRMSNorm() ) is not supported. Currently, only
torch.nn.Linear
andConv1D
are supported. [2024-02-27 04:18:03,745] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 7582 [2024-02-27 04:18:03,746] [ERROR] [launch.py:321:sigkill_handler] ['/usr/local/miniconda3/envs/mobilevlm/bin/python', '-u', 'mobilevlm/train/train_mem.py', '--local_rank=0', '--deepspeed', 'scripts/deepspeed/zero2.json', '--lora_enable', 'True', '--lora_r', '128', '--lora_alpha', '256', '--learning_rate', '2e-4', '--model_name_or_path', './mtgv/MobileVLM_V2-1.7B', '--version', 'v1', '--data_path', 'data/eccv_train.json', '--image_folder', 'data/eccv_train', '--vision_tower', './mtgv/clip-vit-large-patch14-336', '--vision_tower_type', 'clip', '--pretrain_mm_mlp_adapter', './finetune-results/mobilevlm-1.pretrain/mm_projector.bin', '--mm_projector_type', 'ldpnet', '--mm_vision_select_layer', '-2', '--mm_use_im_start_end', 'False', '--mm_use_im_patch_token', 'False', '--image_aspect_ratio', 'pad', '--group_by_modality_length', 'True', '--bf16', 'True', '--output_dir', './finetune-results/mobilevlm-2.finetune-lora', '--num_train_epochs', '1', '--per_device_train_batch_size', '16', '--per_device_eval_batch_size', '4', '--gradient_accumulation_steps', '1', '--evaluation_strategy', 'no', '--save_strategy', 'steps', '--save_steps', '50000', '--save_total_limit', '1', '--weight_decay', '0.', '--warmup_ratio', '0.03', '--lr_scheduler_type', 'cosine', '--logging_steps', '1', '--tf32', 'True', '--model_max_length', '2048', '--gradient_checkpointing', 'True', '--dataloader_num_workers', '4', '--lazy_preprocess', 'True', '--report_to', 'none'] exits with return code = 1 [2024-02-27 04:18:07,883] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) Traceback (most recent call last): File "/hy-tmp/MobileVLM-main/scripts/mergelora.py", line 51, in merge_lora(sys.argv[1], sys.argv[2], sys.argv[3]) File "/hy-tmp/MobileVLM-main/scripts/mergelora.py", line 16, in merge_lora lora_cfg_pretrained = AutoConfig.from_pretrained(model_path) File "/usr/local/miniconda3/envs/mobilevlm/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 1023, in from_pretrained config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, kwargs) File "/usr/local/miniconda3/envs/mobilevlm/lib/python3.10/site-packages/transformers/configuration_utils.py", line 620, in get_config_dict config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, kwargs) File "/usr/local/miniconda3/envs/mobilevlm/lib/python3.10/site-packages/transformers/configuration_utils.py", line 675, in _get_config_dict resolved_config_file = cached_file( File "/usr/local/miniconda3/envs/mobilevlm/lib/python3.10/site-packages/transformers/utils/hub.py", line 400, in cached_file raise EnvironmentError( OSError: ./finetune-results/mobilevlm-2.finetune-lora does not appear to have a file named config.json. Checkout 'https://huggingface.co/./finetune-results/mobilevlm-2.finetune-lora/None' for available files. Done. This is my problem.
Have you make sure the consistent deepspeed and transformers version with ours?
I'm having the same problem, using a scenario where I'm using lora to fine-tune training for MobileVLM V2 1.7B
VScode Debug Setting:
{
"configurations": [
{
"name": "Python Debugger: Python File",
"type": "debugpy",
"request": "launch",
"program": "/home/bian/anaconda3/envs/mobilevlm/bin/deepspeed",
"justMyCode": true,
"console": "integratedTerminal",
"args": [
"mobilevlm/train/train_mem.py",
"--deepspeed", "scripts/deepspeed/zero3.json",
"--lora_enable", "True",
"--lora_r", "128",
"--lora_alpha", "256",
"--learning_rate", "2e-4",
"--model_name_or_path", "model/MobileVLM_V2-1.7B",
"--version", "v1",
"--data_path", "data/finetune_data/filtered_ScienceQA.json",
"--image_folder", "data/finetune_data",
"--vision_tower", "model/clip-vit-large-patch14-336",
"--vision_tower_type", "clip",
"--mm_projector_type", "ldpnetv2",
"--mm_vision_select_layer", "-2",
"--mm_use_im_start_end", "False",
"--mm_use_im_patch_token", "False",
"--image_aspect_ratio", "pad",
"--group_by_modality_length", "True",
"--bf16", "False", //
"--output_dir", "outputs/mobilevlm1.7b/mobilevlm_v2-2.finetune",
"--num_train_epochs", "1",
"--per_device_train_batch_size", "1",
"--per_device_eval_batch_size", "1",
"--gradient_accumulation_steps", "1",
"--evaluation_strategy", "no",
"--save_strategy", "steps",
"--save_steps", "2000",
"--save_total_limit", "1",
"--learning_rate", "4e-5",
"--weight_decay", "0.",
"--warmup_ratio", "0.03",
"--lr_scheduler_type", "cosine",
"--logging_steps", "1",
"--tf32", "False", //
"--model_max_length", "2048",
"--gradient_checkpointing", "True",
"--dataloader_num_workers", "4",
"--lazy_preprocess", "True",
"--report_to", "none"
],
"env": {
"PYTHONPATH": "/media/bian/sdb/zzq/MobileVLM"
}
}
]
}
Dependency version:
_libgcc_mutex 0.1 main defaults
_openmp_mutex 5.1 1_gnu defaults
accelerate 0.21.0 pypi_0 pypi
aiofiles 23.2.1 pypi_0 pypi
aiohttp 3.9.3 pypi_0 pypi
aiosignal 1.3.1 pypi_0 pypi
altair 5.2.0 pypi_0 pypi
anyio 4.3.0 pypi_0 pypi
async-timeout 4.0.3 pypi_0 pypi
attrs 23.2.0 pypi_0 pypi
av 11.0.0 pypi_0 pypi
bitsandbytes 0.41.0 pypi_0 pypi
bzip2 1.0.8 h5eee18b_5 defaults
ca-certificates 2023.12.12 h06a4308_0 defaults
certifi 2024.2.2 pypi_0 pypi
charset-normalizer 3.3.2 pypi_0 pypi
click 8.1.7 pypi_0 pypi
cmake 3.28.3 pypi_0 pypi
contourpy 1.2.0 pypi_0 pypi
cycler 0.12.1 pypi_0 pypi
decord 0.6.0 pypi_0 pypi
deepspeed 0.9.5 pypi_0 pypi
einops 0.6.1 pypi_0 pypi
einops-exts 0.0.4 pypi_0 pypi
exceptiongroup 1.2.0 pypi_0 pypi
fairscale 0.4.13 pypi_0 pypi
fastapi 0.103.0 pypi_0 pypi
ffmpy 0.3.2 pypi_0 pypi
filelock 3.13.1 pypi_0 pypi
flash-attn 2.3.2 pypi_0 pypi
fonttools 4.49.0 pypi_0 pypi
frozenlist 1.4.1 pypi_0 pypi
fsspec 2024.2.0 pypi_0 pypi
fvcore 0.1.5.post20221221 pypi_0 pypi
gradio 3.35.2 pypi_0 pypi
gradio-client 0.10.0 pypi_0 pypi
h11 0.14.0 pypi_0 pypi
hjson 3.1.0 pypi_0 pypi
httpcore 0.17.3 pypi_0 pypi
httpx 0.24.0 pypi_0 pypi
huggingface-hub 0.21.4 pypi_0 pypi
idna 3.6 pypi_0 pypi
ijson 3.2.3 pypi_0 pypi
iopath 0.1.10 pypi_0 pypi
jinja2 3.1.3 pypi_0 pypi
joblib 1.3.2 pypi_0 pypi
jsonschema 4.21.1 pypi_0 pypi
jsonschema-specifications 2023.12.1 pypi_0 pypi
kiwisolver 1.4.5 pypi_0 pypi
ld_impl_linux-64 2.38 h1181459_1 defaults
libffi 3.4.4 h6a678d5_0 defaults
libgcc-ng 11.2.0 h1234567_1 defaults
libgomp 11.2.0 h1234567_1 defaults
libstdcxx-ng 11.2.0 h1234567_1 defaults
libuuid 1.41.5 h5eee18b_0 defaults
lightning-utilities 0.11.1 pypi_0 pypi
linkify-it-py 2.0.3 pypi_0 pypi
lit 17.0.6 pypi_0 pypi
markdown-it-py 2.2.0 pypi_0 pypi
markdown2 2.4.8 pypi_0 pypi
markupsafe 2.1.5 pypi_0 pypi
matplotlib 3.8.3 pypi_0 pypi
mdit-py-plugins 0.3.3 pypi_0 pypi
mdurl 0.1.2 pypi_0 pypi
mpmath 1.3.0 pypi_0 pypi
multidict 6.0.5 pypi_0 pypi
ncurses 6.4 h6a678d5_0 defaults
networkx 3.2.1 pypi_0 pypi
ninja 1.11.1.1 pypi_0 pypi
numpy 1.25.0 pypi_0 pypi
nvidia-cublas-cu11 11.10.3.66 pypi_0 pypi
nvidia-cuda-cupti-cu11 11.7.101 pypi_0 pypi
nvidia-cuda-nvrtc-cu11 11.7.99 pypi_0 pypi
nvidia-cuda-runtime-cu11 11.7.99 pypi_0 pypi
nvidia-cudnn-cu11 8.5.0.96 pypi_0 pypi
nvidia-cufft-cu11 10.9.0.58 pypi_0 pypi
nvidia-curand-cu11 10.2.10.91 pypi_0 pypi
nvidia-cusolver-cu11 11.4.0.1 pypi_0 pypi
nvidia-cusparse-cu11 11.7.4.91 pypi_0 pypi
nvidia-nccl-cu11 2.14.3 pypi_0 pypi
nvidia-nvtx-cu11 11.7.91 pypi_0 pypi
openssl 3.0.13 h7f8727e_0 defaults
orjson 3.9.15 pypi_0 pypi
packaging 24.0 pypi_0 pypi
pandas 2.2.1 pypi_0 pypi
parameterized 0.9.0 pypi_0 pypi
peft 0.4.0 pypi_0 pypi
pillow 10.2.0 pypi_0 pypi
pip 24.0 pypi_0 pypi
portalocker 2.8.2 pypi_0 pypi
psutil 5.9.8 pypi_0 pypi
py-cpuinfo 9.0.0 pypi_0 pypi
pydantic 1.10.13 pypi_0 pypi
pydub 0.25.1 pypi_0 pypi
pygments 2.17.2 pypi_0 pypi
pyparsing 3.1.2 pypi_0 pypi
python 3.10.13 h955ad1f_0 defaults
python-dateutil 2.9.0.post0 pypi_0 pypi
python-multipart 0.0.9 pypi_0 pypi
pytorch-lightning 2.2.1 pypi_0 pypi
pytorchvideo 0.1.5 pypi_0 pypi
pytz 2024.1 pypi_0 pypi
pyyaml 6.0.1 pypi_0 pypi
readline 8.2 h5eee18b_0 defaults
referencing 0.33.0 pypi_0 pypi
regex 2023.12.25 pypi_0 pypi
requests 2.28.2 pypi_0 pypi
rpds-py 0.18.0 pypi_0 pypi
safetensors 0.4.2 pypi_0 pypi
scikit-learn 1.2.2 pypi_0 pypi
scipy 1.12.0 pypi_0 pypi
semantic-version 2.10.0 pypi_0 pypi
sentencepiece 0.1.99 pypi_0 pypi
setuptools 68.2.2 py310h06a4308_0 defaults
shortuuid 1.0.11 pypi_0 pypi
six 1.16.0 pypi_0 pypi
sniffio 1.3.1 pypi_0 pypi
sqlite 3.41.2 h5eee18b_0 defaults
starlette 0.27.0 pypi_0 pypi
sympy 1.12 pypi_0 pypi
tabulate 0.9.0 pypi_0 pypi
termcolor 2.4.0 pypi_0 pypi
threadpoolctl 3.3.0 pypi_0 pypi
timm 0.9.12 pypi_0 pypi
tk 8.6.12 h1ccaba5_0 defaults
tokenizers 0.13.3 pypi_0 pypi
toolz 0.12.1 pypi_0 pypi
torch 2.0.1 pypi_0 pypi
torchmetrics 1.3.2 pypi_0 pypi
torchvision 0.15.2 pypi_0 pypi
tqdm 4.66.2 pypi_0 pypi
transformers 4.33.1 pypi_0 pypi
triton 2.0.0 pypi_0 pypi
typing-extensions 4.10.0 pypi_0 pypi
tzdata 2024.1 pypi_0 pypi
uc-micro-py 1.0.3 pypi_0 pypi
urllib3 1.26.18 pypi_0 pypi
uvicorn 0.22.0 pypi_0 pypi
websockets 11.0.3 pypi_0 pypi
wheel 0.41.2 py310h06a4308_0 defaults
xz 5.4.6 h5eee18b_0 defaults
yacs 0.1.8 pypi_0 pypi
yarl 1.9.4 pypi_0 pypi
zlib 1.2.13 h5eee18b_0 defaults
Issue logs:
[2024-03-27 15:23:38,536] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-03-27 15:23:41,459] [WARNING] [runner.py:196:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2024-03-27 15:23:41,460] [INFO] [runner.py:555:main] cmd = /home/bian/anaconda3/envs/mobilevlm/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMV19 --master_addr=127.0.0.1 --master_port=29500 --enable_each_rank_log=None mobilevlm/train/train_mem.py --deepspeed scripts/deepspeed/zero3.json --lora_enable True --lora_r 128 --lora_alpha 256 --learning_rate 2e-4 --model_name_or_path model/MobileVLM_V2-1.7B --version v1 --data_path data/finetune_data/filtered_ScienceQA.json --image_folder data/finetune_data --vision_tower model/clip-vit-large-patch14-336 --video_tower model/clip-vit-large-patch14-336 --vision_tower_type clip --video_tower_type clip --mm_projector_type ldpnet --mm_vision_select_layer -2 --mm_use_im_start_end False --mm_use_im_patch_token False --image_aspect_ratio pad --group_by_modality_length True --bf16 False --output_dir outputs/mobilevlm1.7b/mobilevlm_v2-2.finetune --num_train_epochs 1 --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --gradient_accumulation_steps 1 --evaluation_strategy no --save_strategy steps --save_steps 2000 --save_total_limit 1 --learning_rate 4e-5 --weight_decay 0. --warmup_ratio 0.03 --lr_scheduler_type cosine --logging_steps 1 --tf32 False --model_max_length 2048 --gradient_checkpointing True --dataloader_num_workers 4 --lazy_preprocess True --report_to none
[2024-03-27 15:23:43,212] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-03-27 15:23:44,956] [INFO] [launch.py:145:main] WORLD INFO DICT: {'localhost': [0, 1]}
[2024-03-27 15:23:44,958] [INFO] [launch.py:151:main] nnodes=1, num_local_procs=2, node_rank=0
[2024-03-27 15:23:44,958] [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1]})
[2024-03-27 15:23:44,958] [INFO] [launch.py:163:main] dist_world_size=2
[2024-03-27 15:23:44,958] [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=0,1
[2024-03-27 15:23:51,158] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
/media/bian/sdb/zzq/MobileVLM/mobilevlm/train/llama_flash_attn.py:108: UserWarning: Flash attention is only supported on A100 or H100 GPU during training due to head dim > 64 backward.ref: https://github.com/HazyResearch/flash-attention/issues/190#issuecomment-1523359593
warnings.warn(
[2024-03-27 15:23:51,859] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
/media/bian/sdb/zzq/MobileVLM/mobilevlm/train/llama_flash_attn.py:108: UserWarning: Flash attention is only supported on A100 or H100 GPU during training due to head dim > 64 backward.ref: https://github.com/HazyResearch/flash-attention/issues/190#issuecomment-1523359593
warnings.warn(
[2024-03-27 15:23:52,544] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2024-03-27 15:23:52,545] [INFO] [comm.py:594:init_distributed] cdb=None
[2024-03-27 15:23:52,545] [INFO] [comm.py:625:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
[2024-03-27 15:23:53,269] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2024-03-27 15:23:53,269] [INFO] [comm.py:594:init_distributed] cdb=None
[2024-03-27 15:23:58,760] [WARNING] [partition_parameters.py:836:_post_init_method] param `class_embedding` in CLIPVisionEmbeddings not on GPU so was not broadcasted from rank 0
[2024-03-27 15:24:00,602] [INFO] [partition_parameters.py:453:__exit__] finished initializing model with 1.68B parameters
Adding LoRA adapters...
Traceback (most recent call last):
File "/root/.vscode-server/extensions/ms-python.debugpy-2024.2.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/pydevd.py", line 3489, in <module>
main()
File "/root/.vscode-server/extensions/ms-python.debugpy-2024.2.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/pydevd.py", line 3482, in main
globals = debugger.run(setup['file'], None, None, is_module)
File "/root/.vscode-server/extensions/ms-python.debugpy-2024.2.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/pydevd.py", line 2510, in run
return self._exec(is_module, entry_point_fn, module_name, file, globals, locals)
File "/root/.vscode-server/extensions/ms-python.debugpy-2024.2.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/pydevd.py", line 2517, in _exec
globals = pydevd_runpy.run_path(file, globals, '__main__')
File "/root/.vscode-server/extensions/ms-python.debugpy-2024.2.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 321, in run_path
return _run_module_code(code, init_globals, run_name,
File "/root/.vscode-server/extensions/ms-python.debugpy-2024.2.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 135, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "/root/.vscode-server/extensions/ms-python.debugpy-2024.2.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 124, in _run_code
exec(code, run_globals)
File "mobilevlm/train/train_mem.py", line 16, in <module>
train()
File "/media/bian/sdb/zzq/MobileVLM/mobilevlm/train/train.py", line 810, in train
model = get_peft_model(model, lora_config)
File "/home/bian/anaconda3/envs/mobilevlm/lib/python3.10/site-packages/peft/mapping.py", line 98, in get_peft_model
return MODEL_TYPE_TO_PEFT_MODEL_MAPPING[peft_config.task_type](model, peft_config, adapter_name=adapter_name)
File "/home/bian/anaconda3/envs/mobilevlm/lib/python3.10/site-packages/peft/peft_model.py", line 893, in __init__
super().__init__(model, peft_config, adapter_name)
File "/home/bian/anaconda3/envs/mobilevlm/lib/python3.10/site-packages/peft/peft_model.py", line 112, in __init__
self.base_model = PEFT_TYPE_TO_MODEL_MAPPING[peft_config.peft_type](
File "/home/bian/anaconda3/envs/mobilevlm/lib/python3.10/site-packages/peft/tuners/lora.py", line 180, in __init__
self.add_adapter(adapter_name, self.peft_config[adapter_name])
File "/home/bian/anaconda3/envs/mobilevlm/lib/python3.10/site-packages/peft/tuners/lora.py", line 194, in add_adapter
self._find_and_replace(adapter_name)
File "/home/bian/anaconda3/envs/mobilevlm/lib/python3.10/site-packages/peft/tuners/lora.py", line 352, in _find_and_replace
new_module = self._create_new_module(lora_config, adapter_name, target)
File "/home/bian/anaconda3/envs/mobilevlm/lib/python3.10/site-packages/peft/tuners/lora.py", line 305, in _create_new_module
raise ValueError(
ValueError: Target module LlamaDecoderLayer(
(self_attn): LlamaAttention(
(q_proj): Linear(in_features=2048, out_features=2048, bias=False)
(k_proj): Linear(in_features=2048, out_features=2048, bias=False)
(v_proj): Linear(in_features=2048, out_features=2048, bias=False)
(o_proj): Linear(in_features=2048, out_features=2048, bias=False)
(rotary_emb): LlamaRotaryEmbedding()
)
(mlp): LlamaMLP(
(gate_proj): Linear(in_features=2048, out_features=5632, bias=False)
(up_proj): Linear(in_features=2048, out_features=5632, bias=False)
(down_proj): Linear(in_features=5632, out_features=2048, bias=False)
(act_fn): SiLUActivation()
)
(input_layernorm): LlamaRMSNorm()
(post_attention_layernorm): LlamaRMSNorm()
) is not supported. Currently, only `torch.nn.Linear` and `Conv1D` are supported.
[2024-03-27 15:24:29,011] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 3033390
[2024-03-27 15:24:29,287] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 3033391
[2024-03-27 15:24:29,288] [ERROR] [launch.py:321:sigkill_handler] ['/home/bian/anaconda3/envs/mobilevlm/bin/python', '-u', 'mobilevlm/train/train_mem.py', '--local_rank=1', '--deepspeed', 'scripts/deepspeed/zero3.json', '--lora_enable', 'True', '--lora_r', '128', '--lora_alpha', '256', '--learning_rate', '2e-4', '--model_name_or_path', 'model/MobileVLM_V2-1.7B', '--version', 'v1', '--data_path', 'data/finetune_data/filtered_ScienceQA.json', '--image_folder', 'data/finetune_data', '--vision_tower', 'model/clip-vit-large-patch14-336', '--video_tower', 'model/clip-vit-large-patch14-336', '--vision_tower_type', 'clip', '--video_tower_type', 'clip', '--mm_projector_type', 'ldpnet', '--mm_vision_select_layer', '-2', '--mm_use_im_start_end', 'False', '--mm_use_im_patch_token', 'False', '--image_aspect_ratio', 'pad', '--group_by_modality_length', 'True', '--bf16', 'False', '--output_dir', 'outputs/mobilevlm1.7b/mobilevlm_v2-2.finetune', '--num_train_epochs', '1', '--per_device_train_batch_size', '1', '--per_device_eval_batch_size', '1', '--gradient_accumulation_steps', '1', '--evaluation_strategy', 'no', '--save_strategy', 'steps', '--save_steps', '2000', '--save_total_limit', '1', '--learning_rate', '4e-5', '--weight_decay', '0.', '--warmup_ratio', '0.03', '--lr_scheduler_type', 'cosine', '--logging_steps', '1', '--tf32', 'False', '--model_max_length', '2048', '--gradient_checkpointing', 'True', '--dataloader_num_workers', '4', '--lazy_preprocess', 'True', '--report_to', 'none'] exits with return code = 1
please refer commit 688fdec and you can start like:
bash run_v1.sh mobilevlm3b finetune.lora ${LANGUAGE_MODEL} ${VISION_MODEL} ${OUTPUT_DIR} bash run_v1.sh mobilevlm3b test ${OUTPUT_DIR}/mobilevlm-2.finetune
where can I download mm_projector --pretrain_mm_mlp_adapter ${OUTPUT_DIR_PT}/mm_projector.bin \
I meets the errors when funtune using loar. ValueError: Target module LlamaDecoderLayer( (self_attn): LlamaAttention( (q_proj): Linear(in_features=2048, out_features=2048, bias=False) (k_proj): Linear(in_features=2048, out_features=2048, bias=False) (v_proj): Linear(in_features=2048, out_features=2048, bias=False) (o_proj): Linear(in_features=2048, out_features=2048, bias=False) (rotary_emb): LlamaRotaryEmbedding() ) (mlp): LlamaMLP( (gate_proj): Linear(in_features=2048, out_features=5632, bias=False) (up_proj): Linear(in_features=2048, out_features=5632, bias=False) (down_proj): Linear(in_features=5632, out_features=2048, bias=False) (act_fn): SiLUActivation() ) (input_layernorm): LlamaRMSNorm() (post_attention_layernorm): LlamaRMSNorm() ) is not supported. Currently, only
torch.nn.Linear
andConv1D
are supported.