Meituan-AutoML / MobileVLM

Strong and Open Vision Language Assistant for Mobile Devices
Apache License 2.0
996 stars 66 forks source link

funtune with lora error #17

Open BadUncleBoy opened 8 months ago

BadUncleBoy commented 8 months ago

I meets the errors when funtune using loar. ValueError: Target module LlamaDecoderLayer( (self_attn): LlamaAttention( (q_proj): Linear(in_features=2048, out_features=2048, bias=False) (k_proj): Linear(in_features=2048, out_features=2048, bias=False) (v_proj): Linear(in_features=2048, out_features=2048, bias=False) (o_proj): Linear(in_features=2048, out_features=2048, bias=False) (rotary_emb): LlamaRotaryEmbedding() ) (mlp): LlamaMLP( (gate_proj): Linear(in_features=2048, out_features=5632, bias=False) (up_proj): Linear(in_features=2048, out_features=5632, bias=False) (down_proj): Linear(in_features=5632, out_features=2048, bias=False) (act_fn): SiLUActivation() ) (input_layernorm): LlamaRMSNorm() (post_attention_layernorm): LlamaRMSNorm() ) is not supported. Currently, only torch.nn.Linear and Conv1D are supported.

er-muyue commented 8 months ago

Have you made any modifications to the original released code? If so, please provide more information about what you modified. If not, please give your startup command like here. Thanks.

Dylan-LDR commented 8 months ago

I meets the errors when funtune using loar. ValueError: Target module LlamaDecoderLayer( (self_attn): LlamaAttention( (q_proj): Linear(in_features=2048, out_features=2048, bias=False) (k_proj): Linear(in_features=2048, out_features=2048, bias=False) (v_proj): Linear(in_features=2048, out_features=2048, bias=False) (o_proj): Linear(in_features=2048, out_features=2048, bias=False) (rotary_emb): LlamaRotaryEmbedding() ) (mlp): LlamaMLP( (gate_proj): Linear(in_features=2048, out_features=5632, bias=False) (up_proj): Linear(in_features=2048, out_features=5632, bias=False) (down_proj): Linear(in_features=5632, out_features=2048, bias=False) (act_fn): SiLUActivation() ) (input_layernorm): LlamaRMSNorm() (post_attention_layernorm): LlamaRMSNorm() ) is not supported. Currently, only torch.nn.Linear and Conv1D are supported.

I met the same problem. The command is not modified from the case "finetune.lora" in run.sh.

sxu1997 commented 7 months ago

I meets the errors when funtune using loar. ValueError: Target module LlamaDecoderLayer( (self_attn): LlamaAttention( (q_proj): Linear(in_features=2048, out_features=2048, bias=False) (k_proj): Linear(in_features=2048, out_features=2048, bias=False) (v_proj): Linear(in_features=2048, out_features=2048, bias=False) (o_proj): Linear(in_features=2048, out_features=2048, bias=False) (rotary_emb): LlamaRotaryEmbedding() ) (mlp): LlamaMLP( (gate_proj): Linear(in_features=2048, out_features=5632, bias=False) (up_proj): Linear(in_features=2048, out_features=5632, bias=False) (down_proj): Linear(in_features=5632, out_features=2048, bias=False) (act_fn): SiLUActivation() ) (input_layernorm): LlamaRMSNorm() (post_attention_layernorm): LlamaRMSNorm() ) is not supported. Currently, only torch.nn.Linear and Conv1D are supported.

Could you provide more information about this error, context logs for example.

QvQKing commented 7 months ago

Traceback (most recent call last): File "/hy-tmp/MobileVLM-main/mobilevlm/train/train_mem.py", line 13, in train() File "/hy-tmp/MobileVLM-main/mobilevlm/train/train.py", line 807, in train model = get_peft_model(model, lora_config) File "/usr/local/miniconda3/envs/mobilevlm/lib/python3.10/site-packages/peft/mapping.py", line 98, in get_peft_model return MODEL_TYPE_TO_PEFT_MODEL_MAPPING[peft_config.task_type](model, peft_config, adapter_name=adapter_name) File "/usr/local/miniconda3/envs/mobilevlm/lib/python3.10/site-packages/peft/peft_model.py", line 893, in init super().init(model, peft_config, adapter_name) File "/usr/local/miniconda3/envs/mobilevlm/lib/python3.10/site-packages/peft/peft_model.py", line 112, in init self.base_model = PEFT_TYPE_TO_MODEL_MAPPING[peft_config.peft_type]( File "/usr/local/miniconda3/envs/mobilevlm/lib/python3.10/site-packages/peft/tuners/lora.py", line 180, in init self.add_adapter(adapter_name, self.peft_config[adapter_name]) File "/usr/local/miniconda3/envs/mobilevlm/lib/python3.10/site-packages/peft/tuners/lora.py", line 194, in add_adapter self._find_and_replace(adapter_name) File "/usr/local/miniconda3/envs/mobilevlm/lib/python3.10/site-packages/peft/tuners/lora.py", line 352, in _find_and_replace new_module = self._create_new_module(lora_config, adapter_name, target) File "/usr/local/miniconda3/envs/mobilevlm/lib/python3.10/site-packages/peft/tuners/lora.py", line 305, in _create_new_module raise ValueError( ValueError: Target module LlamaDecoderLayer( (self_attn): LlamaAttention( (q_proj): Linear(in_features=2048, out_features=2048, bias=False) (k_proj): Linear(in_features=2048, out_features=2048, bias=False) (v_proj): Linear(in_features=2048, out_features=2048, bias=False) (o_proj): Linear(in_features=2048, out_features=2048, bias=False) (rotary_emb): LlamaRotaryEmbedding() ) (mlp): LlamaMLP( (gate_proj): Linear(in_features=2048, out_features=5632, bias=False) (up_proj): Linear(in_features=2048, out_features=5632, bias=False) (down_proj): Linear(in_features=5632, out_features=2048, bias=False) (act_fn): SiLUActivation() ) (input_layernorm): LlamaRMSNorm() (post_attention_layernorm): LlamaRMSNorm() ) is not supported. Currently, only torch.nn.Linear and Conv1D are supported. [2024-02-27 04:18:03,745] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 7582 [2024-02-27 04:18:03,746] [ERROR] [launch.py:321:sigkill_handler] ['/usr/local/miniconda3/envs/mobilevlm/bin/python', '-u', 'mobilevlm/train/train_mem.py', '--local_rank=0', '--deepspeed', 'scripts/deepspeed/zero2.json', '--lora_enable', 'True', '--lora_r', '128', '--lora_alpha', '256', '--learning_rate', '2e-4', '--model_name_or_path', './mtgv/MobileVLM_V2-1.7B', '--version', 'v1', '--data_path', 'data/eccv_train.json', '--image_folder', 'data/eccv_train', '--vision_tower', './mtgv/clip-vit-large-patch14-336', '--vision_tower_type', 'clip', '--pretrain_mm_mlp_adapter', './finetune-results/mobilevlm-1.pretrain/mm_projector.bin', '--mm_projector_type', 'ldpnet', '--mm_vision_select_layer', '-2', '--mm_use_im_start_end', 'False', '--mm_use_im_patch_token', 'False', '--image_aspect_ratio', 'pad', '--group_by_modality_length', 'True', '--bf16', 'True', '--output_dir', './finetune-results/mobilevlm-2.finetune-lora', '--num_train_epochs', '1', '--per_device_train_batch_size', '16', '--per_device_eval_batch_size', '4', '--gradient_accumulation_steps', '1', '--evaluation_strategy', 'no', '--save_strategy', 'steps', '--save_steps', '50000', '--save_total_limit', '1', '--weight_decay', '0.', '--warmup_ratio', '0.03', '--lr_scheduler_type', 'cosine', '--logging_steps', '1', '--tf32', 'True', '--model_max_length', '2048', '--gradient_checkpointing', 'True', '--dataloader_num_workers', '4', '--lazy_preprocess', 'True', '--report_to', 'none'] exits with return code = 1 [2024-02-27 04:18:07,883] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) Traceback (most recent call last): File "/hy-tmp/MobileVLM-main/scripts/mergelora.py", line 51, in merge_lora(sys.argv[1], sys.argv[2], sys.argv[3]) File "/hy-tmp/MobileVLM-main/scripts/mergelora.py", line 16, in merge_lora lora_cfg_pretrained = AutoConfig.from_pretrained(model_path) File "/usr/local/miniconda3/envs/mobilevlm/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 1023, in from_pretrained config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, kwargs) File "/usr/local/miniconda3/envs/mobilevlm/lib/python3.10/site-packages/transformers/configuration_utils.py", line 620, in get_config_dict config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, kwargs) File "/usr/local/miniconda3/envs/mobilevlm/lib/python3.10/site-packages/transformers/configuration_utils.py", line 675, in _get_config_dict resolved_config_file = cached_file( File "/usr/local/miniconda3/envs/mobilevlm/lib/python3.10/site-packages/transformers/utils/hub.py", line 400, in cached_file raise EnvironmentError( OSError: ./finetune-results/mobilevlm-2.finetune-lora does not appear to have a file named config.json. Checkout 'https://huggingface.co/./finetune-results/mobilevlm-2.finetune-lora/None' for available files. Done. This is my problem.

sxu1997 commented 7 months ago

Traceback (most recent call last): File "/hy-tmp/MobileVLM-main/mobilevlm/train/train_mem.py", line 13, in train() File "/hy-tmp/MobileVLM-main/mobilevlm/train/train.py", line 807, in train model = get_peft_model(model, lora_config) File "/usr/local/miniconda3/envs/mobilevlm/lib/python3.10/site-packages/peft/mapping.py", line 98, in get_peft_model return MODEL_TYPE_TO_PEFT_MODEL_MAPPING[peft_config.task_type](model, peft_config, adapter_name=adapter_name) File "/usr/local/miniconda3/envs/mobilevlm/lib/python3.10/site-packages/peft/peft_model.py", line 893, in init super().init(model, peft_config, adapter_name) File "/usr/local/miniconda3/envs/mobilevlm/lib/python3.10/site-packages/peft/peft_model.py", line 112, in init self.base_model = PEFT_TYPE_TO_MODEL_MAPPING[peft_config.peft_type]( File "/usr/local/miniconda3/envs/mobilevlm/lib/python3.10/site-packages/peft/tuners/lora.py", line 180, in init self.add_adapter(adapter_name, self.peft_config[adapter_name]) File "/usr/local/miniconda3/envs/mobilevlm/lib/python3.10/site-packages/peft/tuners/lora.py", line 194, in add_adapter self._find_and_replace(adapter_name) File "/usr/local/miniconda3/envs/mobilevlm/lib/python3.10/site-packages/peft/tuners/lora.py", line 352, in _find_and_replace new_module = self._create_new_module(lora_config, adapter_name, target) File "/usr/local/miniconda3/envs/mobilevlm/lib/python3.10/site-packages/peft/tuners/lora.py", line 305, in _create_new_module raise ValueError( ValueError: Target module LlamaDecoderLayer( (self_attn): LlamaAttention( (q_proj): Linear(in_features=2048, out_features=2048, bias=False) (k_proj): Linear(in_features=2048, out_features=2048, bias=False) (v_proj): Linear(in_features=2048, out_features=2048, bias=False) (o_proj): Linear(in_features=2048, out_features=2048, bias=False) (rotary_emb): LlamaRotaryEmbedding() ) (mlp): LlamaMLP( (gate_proj): Linear(in_features=2048, out_features=5632, bias=False) (up_proj): Linear(in_features=2048, out_features=5632, bias=False) (down_proj): Linear(in_features=5632, out_features=2048, bias=False) (act_fn): SiLUActivation() ) (input_layernorm): LlamaRMSNorm() (post_attention_layernorm): LlamaRMSNorm() ) is not supported. Currently, only torch.nn.Linear and Conv1D are supported. [2024-02-27 04:18:03,745] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 7582 [2024-02-27 04:18:03,746] [ERROR] [launch.py:321:sigkill_handler] ['/usr/local/miniconda3/envs/mobilevlm/bin/python', '-u', 'mobilevlm/train/train_mem.py', '--local_rank=0', '--deepspeed', 'scripts/deepspeed/zero2.json', '--lora_enable', 'True', '--lora_r', '128', '--lora_alpha', '256', '--learning_rate', '2e-4', '--model_name_or_path', './mtgv/MobileVLM_V2-1.7B', '--version', 'v1', '--data_path', 'data/eccv_train.json', '--image_folder', 'data/eccv_train', '--vision_tower', './mtgv/clip-vit-large-patch14-336', '--vision_tower_type', 'clip', '--pretrain_mm_mlp_adapter', './finetune-results/mobilevlm-1.pretrain/mm_projector.bin', '--mm_projector_type', 'ldpnet', '--mm_vision_select_layer', '-2', '--mm_use_im_start_end', 'False', '--mm_use_im_patch_token', 'False', '--image_aspect_ratio', 'pad', '--group_by_modality_length', 'True', '--bf16', 'True', '--output_dir', './finetune-results/mobilevlm-2.finetune-lora', '--num_train_epochs', '1', '--per_device_train_batch_size', '16', '--per_device_eval_batch_size', '4', '--gradient_accumulation_steps', '1', '--evaluation_strategy', 'no', '--save_strategy', 'steps', '--save_steps', '50000', '--save_total_limit', '1', '--weight_decay', '0.', '--warmup_ratio', '0.03', '--lr_scheduler_type', 'cosine', '--logging_steps', '1', '--tf32', 'True', '--model_max_length', '2048', '--gradient_checkpointing', 'True', '--dataloader_num_workers', '4', '--lazy_preprocess', 'True', '--report_to', 'none'] exits with return code = 1 [2024-02-27 04:18:07,883] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect) Traceback (most recent call last): File "/hy-tmp/MobileVLM-main/scripts/mergelora.py", line 51, in merge_lora(sys.argv[1], sys.argv[2], sys.argv[3]) File "/hy-tmp/MobileVLM-main/scripts/mergelora.py", line 16, in merge_lora lora_cfg_pretrained = AutoConfig.from_pretrained(model_path) File "/usr/local/miniconda3/envs/mobilevlm/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 1023, in from_pretrained config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, kwargs) File "/usr/local/miniconda3/envs/mobilevlm/lib/python3.10/site-packages/transformers/configuration_utils.py", line 620, in get_config_dict config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, kwargs) File "/usr/local/miniconda3/envs/mobilevlm/lib/python3.10/site-packages/transformers/configuration_utils.py", line 675, in _get_config_dict resolved_config_file = cached_file( File "/usr/local/miniconda3/envs/mobilevlm/lib/python3.10/site-packages/transformers/utils/hub.py", line 400, in cached_file raise EnvironmentError( OSError: ./finetune-results/mobilevlm-2.finetune-lora does not appear to have a file named config.json. Checkout 'https://huggingface.co/./finetune-results/mobilevlm-2.finetune-lora/None' for available files. Done. This is my problem.

Have you make sure the consistent deepspeed and transformers version with ours?

self-discipline-zhu commented 6 months ago

I'm having the same problem, using a scenario where I'm using lora to fine-tune training for MobileVLM V2 1.7B

VScode Debug Setting

{
    "configurations": [
        {
            "name": "Python Debugger: Python File",
            "type": "debugpy",
            "request": "launch",
            "program": "/home/bian/anaconda3/envs/mobilevlm/bin/deepspeed",
            "justMyCode": true,
            "console": "integratedTerminal",
            "args": [
                "mobilevlm/train/train_mem.py",
                "--deepspeed", "scripts/deepspeed/zero3.json",
                "--lora_enable", "True",
                "--lora_r", "128", 
                "--lora_alpha", "256",
                "--learning_rate", "2e-4",
                "--model_name_or_path", "model/MobileVLM_V2-1.7B",
                "--version", "v1",
                "--data_path", "data/finetune_data/filtered_ScienceQA.json",
                "--image_folder", "data/finetune_data",
                "--vision_tower", "model/clip-vit-large-patch14-336",
                "--vision_tower_type", "clip",
                "--mm_projector_type", "ldpnetv2",
                "--mm_vision_select_layer", "-2",
                "--mm_use_im_start_end", "False",
                "--mm_use_im_patch_token", "False",
                "--image_aspect_ratio", "pad",
                "--group_by_modality_length", "True",
                "--bf16", "False", // 
                "--output_dir", "outputs/mobilevlm1.7b/mobilevlm_v2-2.finetune",
                "--num_train_epochs", "1",
                "--per_device_train_batch_size", "1",
                "--per_device_eval_batch_size", "1",
                "--gradient_accumulation_steps", "1",
                "--evaluation_strategy", "no",
                "--save_strategy", "steps",
                "--save_steps", "2000",
                "--save_total_limit", "1",
                "--learning_rate", "4e-5",
                "--weight_decay", "0.",
                "--warmup_ratio", "0.03",
                "--lr_scheduler_type", "cosine",
                "--logging_steps", "1",
                "--tf32", "False", //
                "--model_max_length", "2048",
                "--gradient_checkpointing", "True",
                "--dataloader_num_workers", "4",
                "--lazy_preprocess", "True",
                "--report_to", "none"
            ],
            "env": {
                "PYTHONPATH": "/media/bian/sdb/zzq/MobileVLM"
            }
        }
    ]
}

Dependency version

_libgcc_mutex             0.1                        main    defaults
_openmp_mutex             5.1                       1_gnu    defaults
accelerate                0.21.0                   pypi_0    pypi
aiofiles                  23.2.1                   pypi_0    pypi
aiohttp                   3.9.3                    pypi_0    pypi
aiosignal                 1.3.1                    pypi_0    pypi
altair                    5.2.0                    pypi_0    pypi
anyio                     4.3.0                    pypi_0    pypi
async-timeout             4.0.3                    pypi_0    pypi
attrs                     23.2.0                   pypi_0    pypi
av                        11.0.0                   pypi_0    pypi
bitsandbytes              0.41.0                   pypi_0    pypi
bzip2                     1.0.8                h5eee18b_5    defaults
ca-certificates           2023.12.12           h06a4308_0    defaults
certifi                   2024.2.2                 pypi_0    pypi
charset-normalizer        3.3.2                    pypi_0    pypi
click                     8.1.7                    pypi_0    pypi
cmake                     3.28.3                   pypi_0    pypi
contourpy                 1.2.0                    pypi_0    pypi
cycler                    0.12.1                   pypi_0    pypi
decord                    0.6.0                    pypi_0    pypi
deepspeed                 0.9.5                    pypi_0    pypi
einops                    0.6.1                    pypi_0    pypi
einops-exts               0.0.4                    pypi_0    pypi
exceptiongroup            1.2.0                    pypi_0    pypi
fairscale                 0.4.13                   pypi_0    pypi
fastapi                   0.103.0                  pypi_0    pypi
ffmpy                     0.3.2                    pypi_0    pypi
filelock                  3.13.1                   pypi_0    pypi
flash-attn                2.3.2                    pypi_0    pypi
fonttools                 4.49.0                   pypi_0    pypi
frozenlist                1.4.1                    pypi_0    pypi
fsspec                    2024.2.0                 pypi_0    pypi
fvcore                    0.1.5.post20221221          pypi_0    pypi
gradio                    3.35.2                   pypi_0    pypi
gradio-client             0.10.0                   pypi_0    pypi
h11                       0.14.0                   pypi_0    pypi
hjson                     3.1.0                    pypi_0    pypi
httpcore                  0.17.3                   pypi_0    pypi
httpx                     0.24.0                   pypi_0    pypi
huggingface-hub           0.21.4                   pypi_0    pypi
idna                      3.6                      pypi_0    pypi
ijson                     3.2.3                    pypi_0    pypi
iopath                    0.1.10                   pypi_0    pypi
jinja2                    3.1.3                    pypi_0    pypi
joblib                    1.3.2                    pypi_0    pypi
jsonschema                4.21.1                   pypi_0    pypi
jsonschema-specifications 2023.12.1                pypi_0    pypi
kiwisolver                1.4.5                    pypi_0    pypi
ld_impl_linux-64          2.38                 h1181459_1    defaults
libffi                    3.4.4                h6a678d5_0    defaults
libgcc-ng                 11.2.0               h1234567_1    defaults
libgomp                   11.2.0               h1234567_1    defaults
libstdcxx-ng              11.2.0               h1234567_1    defaults
libuuid                   1.41.5               h5eee18b_0    defaults
lightning-utilities       0.11.1                   pypi_0    pypi
linkify-it-py             2.0.3                    pypi_0    pypi
lit                       17.0.6                   pypi_0    pypi
markdown-it-py            2.2.0                    pypi_0    pypi
markdown2                 2.4.8                    pypi_0    pypi
markupsafe                2.1.5                    pypi_0    pypi
matplotlib                3.8.3                    pypi_0    pypi
mdit-py-plugins           0.3.3                    pypi_0    pypi
mdurl                     0.1.2                    pypi_0    pypi
mpmath                    1.3.0                    pypi_0    pypi
multidict                 6.0.5                    pypi_0    pypi
ncurses                   6.4                  h6a678d5_0    defaults
networkx                  3.2.1                    pypi_0    pypi
ninja                     1.11.1.1                 pypi_0    pypi
numpy                     1.25.0                   pypi_0    pypi
nvidia-cublas-cu11        11.10.3.66               pypi_0    pypi
nvidia-cuda-cupti-cu11    11.7.101                 pypi_0    pypi
nvidia-cuda-nvrtc-cu11    11.7.99                  pypi_0    pypi
nvidia-cuda-runtime-cu11  11.7.99                  pypi_0    pypi
nvidia-cudnn-cu11         8.5.0.96                 pypi_0    pypi
nvidia-cufft-cu11         10.9.0.58                pypi_0    pypi
nvidia-curand-cu11        10.2.10.91               pypi_0    pypi
nvidia-cusolver-cu11      11.4.0.1                 pypi_0    pypi
nvidia-cusparse-cu11      11.7.4.91                pypi_0    pypi
nvidia-nccl-cu11          2.14.3                   pypi_0    pypi
nvidia-nvtx-cu11          11.7.91                  pypi_0    pypi
openssl                   3.0.13               h7f8727e_0    defaults
orjson                    3.9.15                   pypi_0    pypi
packaging                 24.0                     pypi_0    pypi
pandas                    2.2.1                    pypi_0    pypi
parameterized             0.9.0                    pypi_0    pypi
peft                      0.4.0                    pypi_0    pypi
pillow                    10.2.0                   pypi_0    pypi
pip                       24.0                     pypi_0    pypi
portalocker               2.8.2                    pypi_0    pypi
psutil                    5.9.8                    pypi_0    pypi
py-cpuinfo                9.0.0                    pypi_0    pypi
pydantic                  1.10.13                  pypi_0    pypi
pydub                     0.25.1                   pypi_0    pypi
pygments                  2.17.2                   pypi_0    pypi
pyparsing                 3.1.2                    pypi_0    pypi
python                    3.10.13              h955ad1f_0    defaults
python-dateutil           2.9.0.post0              pypi_0    pypi
python-multipart          0.0.9                    pypi_0    pypi
pytorch-lightning         2.2.1                    pypi_0    pypi
pytorchvideo              0.1.5                    pypi_0    pypi
pytz                      2024.1                   pypi_0    pypi
pyyaml                    6.0.1                    pypi_0    pypi
readline                  8.2                  h5eee18b_0    defaults
referencing               0.33.0                   pypi_0    pypi
regex                     2023.12.25               pypi_0    pypi
requests                  2.28.2                   pypi_0    pypi
rpds-py                   0.18.0                   pypi_0    pypi
safetensors               0.4.2                    pypi_0    pypi
scikit-learn              1.2.2                    pypi_0    pypi
scipy                     1.12.0                   pypi_0    pypi
semantic-version          2.10.0                   pypi_0    pypi
sentencepiece             0.1.99                   pypi_0    pypi
setuptools                68.2.2          py310h06a4308_0    defaults
shortuuid                 1.0.11                   pypi_0    pypi
six                       1.16.0                   pypi_0    pypi
sniffio                   1.3.1                    pypi_0    pypi
sqlite                    3.41.2               h5eee18b_0    defaults
starlette                 0.27.0                   pypi_0    pypi
sympy                     1.12                     pypi_0    pypi
tabulate                  0.9.0                    pypi_0    pypi
termcolor                 2.4.0                    pypi_0    pypi
threadpoolctl             3.3.0                    pypi_0    pypi
timm                      0.9.12                   pypi_0    pypi
tk                        8.6.12               h1ccaba5_0    defaults
tokenizers                0.13.3                   pypi_0    pypi
toolz                     0.12.1                   pypi_0    pypi
torch                     2.0.1                    pypi_0    pypi
torchmetrics              1.3.2                    pypi_0    pypi
torchvision               0.15.2                   pypi_0    pypi
tqdm                      4.66.2                   pypi_0    pypi
transformers              4.33.1                   pypi_0    pypi
triton                    2.0.0                    pypi_0    pypi
typing-extensions         4.10.0                   pypi_0    pypi
tzdata                    2024.1                   pypi_0    pypi
uc-micro-py               1.0.3                    pypi_0    pypi
urllib3                   1.26.18                  pypi_0    pypi
uvicorn                   0.22.0                   pypi_0    pypi
websockets                11.0.3                   pypi_0    pypi
wheel                     0.41.2          py310h06a4308_0    defaults
xz                        5.4.6                h5eee18b_0    defaults
yacs                      0.1.8                    pypi_0    pypi
yarl                      1.9.4                    pypi_0    pypi
zlib                      1.2.13               h5eee18b_0    defaults

Issue logs

[2024-03-27 15:23:38,536] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-03-27 15:23:41,459] [WARNING] [runner.py:196:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2024-03-27 15:23:41,460] [INFO] [runner.py:555:main] cmd = /home/bian/anaconda3/envs/mobilevlm/bin/python -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMV19 --master_addr=127.0.0.1 --master_port=29500 --enable_each_rank_log=None mobilevlm/train/train_mem.py --deepspeed scripts/deepspeed/zero3.json --lora_enable True --lora_r 128 --lora_alpha 256 --learning_rate 2e-4 --model_name_or_path model/MobileVLM_V2-1.7B --version v1 --data_path data/finetune_data/filtered_ScienceQA.json --image_folder data/finetune_data --vision_tower model/clip-vit-large-patch14-336 --video_tower model/clip-vit-large-patch14-336 --vision_tower_type clip --video_tower_type clip --mm_projector_type ldpnet --mm_vision_select_layer -2 --mm_use_im_start_end False --mm_use_im_patch_token False --image_aspect_ratio pad --group_by_modality_length True --bf16 False --output_dir outputs/mobilevlm1.7b/mobilevlm_v2-2.finetune --num_train_epochs 1 --per_device_train_batch_size 1 --per_device_eval_batch_size 1 --gradient_accumulation_steps 1 --evaluation_strategy no --save_strategy steps --save_steps 2000 --save_total_limit 1 --learning_rate 4e-5 --weight_decay 0. --warmup_ratio 0.03 --lr_scheduler_type cosine --logging_steps 1 --tf32 False --model_max_length 2048 --gradient_checkpointing True --dataloader_num_workers 4 --lazy_preprocess True --report_to none
[2024-03-27 15:23:43,212] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-03-27 15:23:44,956] [INFO] [launch.py:145:main] WORLD INFO DICT: {'localhost': [0, 1]}
[2024-03-27 15:23:44,958] [INFO] [launch.py:151:main] nnodes=1, num_local_procs=2, node_rank=0
[2024-03-27 15:23:44,958] [INFO] [launch.py:162:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1]})
[2024-03-27 15:23:44,958] [INFO] [launch.py:163:main] dist_world_size=2
[2024-03-27 15:23:44,958] [INFO] [launch.py:165:main] Setting CUDA_VISIBLE_DEVICES=0,1
[2024-03-27 15:23:51,158] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
/media/bian/sdb/zzq/MobileVLM/mobilevlm/train/llama_flash_attn.py:108: UserWarning: Flash attention is only supported on A100 or H100 GPU during training due to head dim > 64 backward.ref: https://github.com/HazyResearch/flash-attention/issues/190#issuecomment-1523359593
  warnings.warn(
[2024-03-27 15:23:51,859] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
/media/bian/sdb/zzq/MobileVLM/mobilevlm/train/llama_flash_attn.py:108: UserWarning: Flash attention is only supported on A100 or H100 GPU during training due to head dim > 64 backward.ref: https://github.com/HazyResearch/flash-attention/issues/190#issuecomment-1523359593
  warnings.warn(
[2024-03-27 15:23:52,544] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2024-03-27 15:23:52,545] [INFO] [comm.py:594:init_distributed] cdb=None
[2024-03-27 15:23:52,545] [INFO] [comm.py:625:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
[2024-03-27 15:23:53,269] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2024-03-27 15:23:53,269] [INFO] [comm.py:594:init_distributed] cdb=None
[2024-03-27 15:23:58,760] [WARNING] [partition_parameters.py:836:_post_init_method] param `class_embedding` in CLIPVisionEmbeddings not on GPU so was not broadcasted from rank 0
[2024-03-27 15:24:00,602] [INFO] [partition_parameters.py:453:__exit__] finished initializing model with 1.68B parameters
Adding LoRA adapters...
Traceback (most recent call last):
  File "/root/.vscode-server/extensions/ms-python.debugpy-2024.2.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/pydevd.py", line 3489, in <module>
    main()
  File "/root/.vscode-server/extensions/ms-python.debugpy-2024.2.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/pydevd.py", line 3482, in main
    globals = debugger.run(setup['file'], None, None, is_module)
  File "/root/.vscode-server/extensions/ms-python.debugpy-2024.2.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/pydevd.py", line 2510, in run
    return self._exec(is_module, entry_point_fn, module_name, file, globals, locals)
  File "/root/.vscode-server/extensions/ms-python.debugpy-2024.2.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/pydevd.py", line 2517, in _exec
    globals = pydevd_runpy.run_path(file, globals, '__main__')
  File "/root/.vscode-server/extensions/ms-python.debugpy-2024.2.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 321, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "/root/.vscode-server/extensions/ms-python.debugpy-2024.2.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 135, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/root/.vscode-server/extensions/ms-python.debugpy-2024.2.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 124, in _run_code
    exec(code, run_globals)
  File "mobilevlm/train/train_mem.py", line 16, in <module>
    train()
  File "/media/bian/sdb/zzq/MobileVLM/mobilevlm/train/train.py", line 810, in train
    model = get_peft_model(model, lora_config)
  File "/home/bian/anaconda3/envs/mobilevlm/lib/python3.10/site-packages/peft/mapping.py", line 98, in get_peft_model
    return MODEL_TYPE_TO_PEFT_MODEL_MAPPING[peft_config.task_type](model, peft_config, adapter_name=adapter_name)
  File "/home/bian/anaconda3/envs/mobilevlm/lib/python3.10/site-packages/peft/peft_model.py", line 893, in __init__
    super().__init__(model, peft_config, adapter_name)
  File "/home/bian/anaconda3/envs/mobilevlm/lib/python3.10/site-packages/peft/peft_model.py", line 112, in __init__
    self.base_model = PEFT_TYPE_TO_MODEL_MAPPING[peft_config.peft_type](
  File "/home/bian/anaconda3/envs/mobilevlm/lib/python3.10/site-packages/peft/tuners/lora.py", line 180, in __init__
    self.add_adapter(adapter_name, self.peft_config[adapter_name])
  File "/home/bian/anaconda3/envs/mobilevlm/lib/python3.10/site-packages/peft/tuners/lora.py", line 194, in add_adapter
    self._find_and_replace(adapter_name)
  File "/home/bian/anaconda3/envs/mobilevlm/lib/python3.10/site-packages/peft/tuners/lora.py", line 352, in _find_and_replace
    new_module = self._create_new_module(lora_config, adapter_name, target)
  File "/home/bian/anaconda3/envs/mobilevlm/lib/python3.10/site-packages/peft/tuners/lora.py", line 305, in _create_new_module
    raise ValueError(
ValueError: Target module LlamaDecoderLayer(
  (self_attn): LlamaAttention(
    (q_proj): Linear(in_features=2048, out_features=2048, bias=False)
    (k_proj): Linear(in_features=2048, out_features=2048, bias=False)
    (v_proj): Linear(in_features=2048, out_features=2048, bias=False)
    (o_proj): Linear(in_features=2048, out_features=2048, bias=False)
    (rotary_emb): LlamaRotaryEmbedding()
  )
  (mlp): LlamaMLP(
    (gate_proj): Linear(in_features=2048, out_features=5632, bias=False)
    (up_proj): Linear(in_features=2048, out_features=5632, bias=False)
    (down_proj): Linear(in_features=5632, out_features=2048, bias=False)
    (act_fn): SiLUActivation()
  )
  (input_layernorm): LlamaRMSNorm()
  (post_attention_layernorm): LlamaRMSNorm()
) is not supported. Currently, only `torch.nn.Linear` and `Conv1D` are supported.
[2024-03-27 15:24:29,011] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 3033390
[2024-03-27 15:24:29,287] [INFO] [launch.py:315:sigkill_handler] Killing subprocess 3033391
[2024-03-27 15:24:29,288] [ERROR] [launch.py:321:sigkill_handler] ['/home/bian/anaconda3/envs/mobilevlm/bin/python', '-u', 'mobilevlm/train/train_mem.py', '--local_rank=1', '--deepspeed', 'scripts/deepspeed/zero3.json', '--lora_enable', 'True', '--lora_r', '128', '--lora_alpha', '256', '--learning_rate', '2e-4', '--model_name_or_path', 'model/MobileVLM_V2-1.7B', '--version', 'v1', '--data_path', 'data/finetune_data/filtered_ScienceQA.json', '--image_folder', 'data/finetune_data', '--vision_tower', 'model/clip-vit-large-patch14-336', '--video_tower', 'model/clip-vit-large-patch14-336', '--vision_tower_type', 'clip', '--video_tower_type', 'clip', '--mm_projector_type', 'ldpnet', '--mm_vision_select_layer', '-2', '--mm_use_im_start_end', 'False', '--mm_use_im_patch_token', 'False', '--image_aspect_ratio', 'pad', '--group_by_modality_length', 'True', '--bf16', 'False', '--output_dir', 'outputs/mobilevlm1.7b/mobilevlm_v2-2.finetune', '--num_train_epochs', '1', '--per_device_train_batch_size', '1', '--per_device_eval_batch_size', '1', '--gradient_accumulation_steps', '1', '--evaluation_strategy', 'no', '--save_strategy', 'steps', '--save_steps', '2000', '--save_total_limit', '1', '--learning_rate', '4e-5', '--weight_decay', '0.', '--warmup_ratio', '0.03', '--lr_scheduler_type', 'cosine', '--logging_steps', '1', '--tf32', 'False', '--model_max_length', '2048', '--gradient_checkpointing', 'True', '--dataloader_num_workers', '4', '--lazy_preprocess', 'True', '--report_to', 'none'] exits with return code = 1
sxu1997 commented 5 months ago

please refer commit 688fdec and you can start like:

bash run_v1.sh mobilevlm3b finetune.lora ${LANGUAGE_MODEL} ${VISION_MODEL} ${OUTPUT_DIR} bash run_v1.sh mobilevlm3b test ${OUTPUT_DIR}/mobilevlm-2.finetune

wuwu-C commented 4 months ago

where can I download mm_projector --pretrain_mm_mlp_adapter ${OUTPUT_DIR_PT}/mm_projector.bin \