InternLM / InternLM-XComposer

InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Apache License 2.0
2.51k stars 154 forks source link

Fine tuning of quanitized internlm/internlm-xcomposer2-4khd-7b model? #402

Open zhuraromdev opened 3 months ago

zhuraromdev commented 3 months ago

Hello,

I have a question regarding fine tuning of quanitized internlm/internlm-xcomposer2-4khd-7b model. I have made quantization of 4khd model with lmdeploy, not trying to make fine tunning of this model. However, I am getting, such issue during this process. Do you have any suggestion, how I can solve it?

Env:

intern_clean) ubuntu@ip-172-31-18-91:~/InternLM-XComposer/finetune$ lmdeploy check_env
Matplotlib is building the font cache; this may take a moment.
sys.platform: linux
Python: 3.9.19 (main, May  6 2024, 19:43:03) [GCC 11.2.0]
CUDA available: True
MUSA available: False
numpy_random_seed: 2147483648
GPU 0,1,2,3: NVIDIA A10G
CUDA_HOME: /usr/local/cuda-12.1
NVCC: Cuda compilation tools, release 12.1, V12.1.105
GCC: gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
PyTorch: 2.2.2+cu121
PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201703
  - Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v3.3.2 (Git Hash 2dc95a2ad0841e29db8b22fbccaf3e5da7992b01)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 12.1
  - NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90
  - CuDNN 8.9.7  (built against CUDA 12.2)
    - Built with CuDNN 8.9.2
  - Magma 2.6.1
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=12.1, CUDNN_VERSION=8.9.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=2.2.2, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF, 

TorchVision: 0.17.2+cu121
LMDeploy: 0.5.1+5840351
transformers: 4.33.2
gradio: 4.13.0
fastapi: 0.111.1
pydantic: 2.8.2
triton: 2.2.0
NVIDIA Topology: 
        GPU0    GPU1    GPU2    GPU3    CPU Affinity    NUMA Affinity   GPU NUMA ID
GPU0     X      PHB     PHB     PHB     0-47    0               N/A
GPU1    PHB      X      PHB     PHB     0-47    0               N/A
GPU2    PHB     PHB      X      PHB     0-47    0               N/A
GPU3    PHB     PHB     PHB      X      0-47    0               N/A

Legend:

  X    = Self
  SYS  = Connection traversing PCIe as well as the SMP interconnect between NUMA nodes (e.g., QPI/UPI)
  NODE = Connection traversing PCIe as well as the interconnect between PCIe Host Bridges within a NUMA node
  PHB  = Connection traversing PCIe as well as a PCIe Host Bridge (typically the CPU)
  PXB  = Connection traversing multiple PCIe bridges (without traversing the PCIe Host Bridge)
  PIX  = Connection traversing at most a single PCIe bridge
  NV#  = Connection traversing a bonded set of # NVLinks

finetune_lora.sh

#!/bin/bash
export CUDA_DEVICE_MAX_CONNECTIONS=1
DIR=`pwd`

export TOKEN="TOKEN"

export MODEL="agentsea/internlm-xcomposer2-4khd-7b-4bit"
# export DATA="path of data"
export DATA="data.txt"

GPUS_PER_NODE=4
NNODES=1
NODE_RANK=0
MASTER_ADDR=localhost
MASTER_PORT=6001

DISTRIBUTED_ARGS="
    --nproc_per_node $GPUS_PER_NODE \
    --nnodes $NNODES \
    --node_rank $NODE_RANK \
    --master_addr $MASTER_ADDR \
    --master_port $MASTER_PORT
"

torchrun $DISTRIBUTED_ARGS finetune.py \
    --model_name_or_path $MODEL \
    --data_path $DATA \
    --given_num True \
    --bf16 True \
    --fix_vit True \
    --fix_sampler True \
    --use_lora True \
    --hd_num 16 \
    --output_dir output/finetune_lora \
    --num_train_epochs 5 \
    --batch_size 2 \
    --per_device_train_batch_size 1 \
    --per_device_eval_batch_size 1 \
    --gradient_accumulation_steps 16 \
    --evaluation_strategy "no" \
    --save_strategy "epoch" \
    --save_total_limit 1 \
    --learning_rate 5e-5 \
    --weight_decay 0.1 \
    --adam_beta2 0.95 \
    --warmup_ratio 0.01 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --report_to "none" \
    --max_length 16384 \
    --deepspeed ds_config_zero2.json \
    --gradient_checkpointing True

ds_config_zero2.json

{
    "fp16": {
        "enabled": "auto",
        "loss_scale": 0,
        "loss_scale_window": 1000,
        "initial_scale_power": 16,
        "hysteresis": 2,
        "min_loss_scale": 1
    },
    "bf16": {
        "enabled": "auto"
    },
    "zero_optimization": {
        "stage": 2,
        "offload_optimizer": {
            "device": "none",
            "pin_memory": true
        },
        "allgather_partitions": true,
        "allgather_bucket_size": 2e8,
        "overlap_comm": true,
        "reduce_scatter": true,
        "reduce_bucket_size": 2e8,
        "contiguous_gradients": true
    },
    "gradient_accumulation_steps": "auto",
    "gradient_clipping": "auto",
    "steps_per_print": 100,
    "train_batch_size": "auto",
    "train_micro_batch_size_per_gpu": "auto",
    "wall_clock_breakdown": false
}

data.txt

data/single_turn_single_image_example.json 0.01

Traceback:

(intern_clean) ubuntu@ip-172-31-18-91:~/InternLM-XComposer/finetune$ sh finetune_lora.sh 
[2024-07-25 16:59:29,877] torch.distributed.run: [WARNING] 
[2024-07-25 16:59:29,877] torch.distributed.run: [WARNING] *****************************************
[2024-07-25 16:59:29,877] torch.distributed.run: [WARNING] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed. 
[2024-07-25 16:59:29,877] torch.distributed.run: [WARNING] *****************************************
[2024-07-25 16:59:32,011] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-07-25 16:59:32,019] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-07-25 16:59:32,047] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2024-07-25 16:59:32,050] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/utils/generic.py:311: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/utils/generic.py:311: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/utils/generic.py:311: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/utils/generic.py:311: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
[2024-07-25 16:59:34,054] [INFO] [comm.py:637:init_distributed] cdb=None
/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
[2024-07-25 16:59:34,099] [INFO] [comm.py:637:init_distributed] cdb=None
/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
  warnings.warn(
[2024-07-25 16:59:34,127] [INFO] [comm.py:637:init_distributed] cdb=None
[2024-07-25 16:59:34,127] [INFO] [comm.py:668:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
[2024-07-25 16:59:34,131] [INFO] [comm.py:637:init_distributed] cdb=None
/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/huggingface_hub/file_download.py:1150: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/huggingface_hub/file_download.py:1150: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/huggingface_hub/file_download.py:1150: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/huggingface_hub/file_download.py:1150: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
Load model from: agentsea/internlm-xcomposer2-4khd-7b-4bit
Load model from: agentsea/internlm-xcomposer2-4khd-7b-4bit
Load model from: agentsea/internlm-xcomposer2-4khd-7b-4bit
Load model from: agentsea/internlm-xcomposer2-4khd-7b-4bit
/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/utils/generic.py:311: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/utils/generic.py:311: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/utils/generic.py:311: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/utils/generic.py:311: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  torch.utils._pytree._register_pytree_node(
Set max length to 16384
Set max length to 16384
Set max length to 16384
Set max length to 16384
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:02<00:00,  1.55it/s]
Some weights of the model checkpoint at agentsea/internlm-xcomposer2-4khd-7b-4bit were not used when initializing InternLMXComposer2ForCausalLM: ['model.layers.12.feed_forward.w3.qzeros', 'model.layers.26.feed_forward.w1.qweight', 'model.layers.19.attention.wqkv.scales', 'model.layers.0.feed_forward.w1.scales', 'model.layers.4.attention.wqkv.qweight', 'model.layers.1.attention.wqkv.scales', 'model.layers.30.feed_forward.w3.qzeros', 'model.layers.29.attention.wqkv.qzeros', 'model.layers.15.attention.wo.qweight', 'model.layers.29.feed_forward.w2.scales', 'model.layers.3.feed_forward.w1.qzeros', 'model.layers.10.feed_forward.w2.qzeros', 'model.layers.19.feed_forward.w1.scales', 'model.layers.18.feed_forward.w2.scales', 'model.layers.14.feed_forward.w2.qzeros', 'model.layers.30.feed_forward.w2.qzeros', 'model.layers.16.feed_forward.w3.scales', 'model.layers.8.feed_forward.w2.qweight', 'model.layers.31.feed_forward.w3.qzeros', 'model.layers.29.feed_forward.w3.qweight', 'model.layers.0.feed_forward.w1.qzeros', 'model.layers.23.attention.wqkv.scales', 'model.layers.28.attention.wqkv.qweight', 'model.layers.1.attention.wqkv.qzeros', 'model.layers.10.attention.wo.qweight', 'model.layers.7.attention.wqkv.qweight', 'model.layers.24.feed_forward.w3.qzeros', 'model.layers.25.attention.wo.qweight', 'model.layers.27.feed_forward.w2.scales', 'model.layers.18.feed_forward.w2.qzeros', 'model.layers.18.feed_forward.w3.scales', 'model.layers.13.feed_forward.w3.qweight', 'model.layers.2.feed_forward.w3.qzeros', 'model.layers.3.feed_forward.w2.qzeros', 'model.layers.24.feed_forward.w2.qweight', 'model.layers.20.feed_forward.w3.qweight', 'model.layers.24.attention.wqkv.qzeros', 'model.layers.3.feed_forward.w1.qweight', 'model.layers.22.attention.wqkv.scales', 'model.layers.10.feed_forward.w3.scales', 'model.layers.20.feed_forward.w1.qweight', 'model.layers.5.attention.wqkv.qzeros', 'model.layers.11.attention.wo.qweight', 'model.layers.11.attention.wqkv.qweight', 'model.layers.16.feed_forward.w1.qweight', 'model.layers.24.attention.wqkv.scales', 'model.layers.10.feed_forward.w1.qzeros', 'model.layers.23.attention.wqkv.qzeros', 'model.layers.21.attention.wqkv.qzeros', 'model.layers.7.feed_forward.w2.qweight', 'model.layers.7.attention.wo.qzeros', 'model.layers.9.attention.wo.qzeros', 'model.layers.29.feed_forward.w2.qweight', 'model.layers.24.feed_forward.w1.scales', 'model.layers.20.feed_forward.w1.scales', 'model.layers.15.feed_forward.w3.scales', 'model.layers.5.attention.wo.scales', 'model.layers.27.attention.wo.scales', 'model.layers.23.feed_forward.w1.qzeros', 'model.layers.25.feed_forward.w1.qzeros', 'model.layers.2.feed_forward.w2.qweight', 'model.layers.20.feed_forward.w2.scales', 'model.layers.12.attention.wqkv.qzeros', 'model.layers.14.attention.wo.qzeros', 'model.layers.22.feed_forward.w2.qzeros', 'model.layers.25.feed_forward.w3.qzeros', 'model.layers.6.feed_forward.w3.scales', 'model.layers.29.feed_forward.w3.qzeros', 'model.layers.19.attention.wo.qweight', 'model.layers.2.attention.wqkv.scales', 'model.layers.1.feed_forward.w2.scales', 'model.layers.19.feed_forward.w1.qweight', 'model.layers.13.attention.wo.qweight', 'model.layers.3.attention.wqkv.qweight', 'model.layers.23.feed_forward.w2.qweight', 'model.layers.22.attention.wo.qzeros', 'model.layers.3.feed_forward.w3.qweight', 'model.layers.16.feed_forward.w1.scales', 'model.layers.27.feed_forward.w1.qweight', 'model.layers.6.attention.wo.qzeros', 'model.layers.5.feed_forward.w1.scales', 'model.layers.1.feed_forward.w1.qzeros', 'model.layers.6.attention.wo.scales', 'model.layers.7.feed_forward.w1.qweight', 'model.layers.22.feed_forward.w3.qweight', 'model.layers.2.feed_forward.w2.scales', 'model.layers.11.feed_forward.w3.qzeros', 'model.layers.13.feed_forward.w1.qzeros', 'model.layers.22.feed_forward.w3.scales', 'model.layers.5.feed_forward.w1.qweight', 'model.layers.28.attention.wqkv.scales', 'model.layers.26.feed_forward.w2.qweight', 'model.layers.4.attention.wo.scales', 'model.layers.19.feed_forward.w3.qzeros', 'model.layers.16.attention.wo.qzeros', 'model.layers.21.feed_forward.w3.qzeros', 'model.layers.16.feed_forward.w2.qzeros', 'model.layers.12.attention.wo.qzeros', 'model.layers.19.attention.wo.scales', 'model.layers.11.feed_forward.w3.qweight', 'model.layers.1.feed_forward.w1.scales', 'model.layers.26.feed_forward.w3.qweight', 'model.layers.31.feed_forward.w1.scales', 'model.layers.17.feed_forward.w2.scales', 'model.layers.13.attention.wo.scales', 'model.layers.21.attention.wo.qweight', 'model.layers.21.attention.wo.scales', 'model.layers.12.feed_forward.w1.scales', 'model.layers.31.attention.wo.qweight', 'model.layers.8.feed_forward.w2.qzeros', 'model.layers.31.feed_forward.w3.scales', 'model.layers.19.attention.wo.qzeros', 'model.layers.11.attention.wo.scales', 'model.layers.28.attention.wo.qweight', 'model.layers.6.feed_forward.w2.qzeros', 'model.layers.18.feed_forward.w1.qweight', 'model.layers.10.feed_forward.w1.scales', 'model.layers.8.feed_forward.w3.qzeros', 'model.layers.31.feed_forward.w2.qweight', 'model.layers.8.attention.wqkv.scales', 'model.layers.3.attention.wo.qweight', 'model.layers.6.feed_forward.w2.qweight', 'model.layers.0.attention.wo.qweight', 'model.layers.23.feed_forward.w2.qzeros', 'model.layers.21.feed_forward.w1.scales', 'model.layers.29.attention.wqkv.qweight', 'model.layers.3.attention.wo.scales', 'model.layers.25.feed_forward.w3.qweight', 'model.layers.30.feed_forward.w3.qweight', 'model.layers.31.attention.wo.scales', 'model.layers.17.attention.wqkv.scales', 'model.layers.29.attention.wo.qweight', 'model.layers.5.attention.wo.qweight', 'model.layers.10.attention.wqkv.qweight', 'model.layers.15.feed_forward.w2.scales', 'model.layers.29.attention.wqkv.scales', 'model.layers.14.feed_forward.w3.qweight', 'model.layers.7.feed_forward.w3.qweight', 'model.layers.16.feed_forward.w3.qweight', 'model.layers.26.attention.wqkv.qzeros', 'model.layers.28.feed_forward.w3.scales', 'model.layers.20.attention.wo.qweight', 'model.layers.25.feed_forward.w1.scales', 'model.layers.26.attention.wo.qweight', 'model.layers.21.feed_forward.w2.qweight', 'model.layers.17.feed_forward.w3.scales', 'model.layers.27.attention.wo.qzeros', 'model.layers.15.attention.wo.scales', 'model.layers.17.feed_forward.w2.qzeros', 'model.layers.12.attention.wqkv.scales', 'model.layers.11.attention.wqkv.scales', 'model.layers.28.feed_forward.w2.qzeros', 'model.layers.1.feed_forward.w3.scales', 'model.layers.0.attention.wqkv.qzeros', 'model.layers.30.attention.wo.qweight', 'model.layers.28.feed_forward.w1.scales', 'model.layers.18.attention.wo.scales', 'model.layers.17.feed_forward.w3.qweight', 'model.layers.3.feed_forward.w2.qweight', 'model.layers.1.feed_forward.w1.qweight', 'model.layers.23.feed_forward.w3.qweight', 'model.layers.9.feed_forward.w2.qzeros', 'model.layers.9.feed_forward.w1.qzeros', 'model.layers.28.feed_forward.w1.qzeros', 'model.layers.21.feed_forward.w1.qweight', 'model.layers.25.attention.wqkv.qzeros', 'model.layers.29.feed_forward.w1.qweight', 'model.layers.4.attention.wo.qzeros', 'model.layers.13.feed_forward.w1.qweight', 'model.layers.15.attention.wqkv.qzeros', 'model.layers.13.attention.wo.qzeros', 'model.layers.26.feed_forward.w1.qzeros', 'model.layers.8.attention.wo.qzeros', 'model.layers.14.feed_forward.w1.qzeros', 'model.layers.4.feed_forward.w3.scales', 'model.layers.22.feed_forward.w2.qweight', 'model.layers.7.feed_forward.w2.qzeros', 'model.layers.29.attention.wo.qzeros', 'model.layers.3.attention.wo.qzeros', 'model.layers.8.feed_forward.w2.scales', 'model.layers.5.feed_forward.w2.scales', 'model.layers.17.attention.wo.qzeros', 'model.layers.11.attention.wqkv.qzeros', 'model.layers.2.feed_forward.w1.scales', 'model.layers.24.attention.wo.qweight', 'model.layers.24.feed_forward.w1.qweight', 'model.layers.4.feed_forward.w3.qweight', 'model.layers.6.feed_forward.w1.qzeros', 'model.layers.26.attention.wo.scales', 'model.layers.20.feed_forward.w1.qzeros', 'model.layers.30.feed_forward.w1.scales', 'model.layers.22.feed_forward.w1.qzeros', 'model.layers.15.feed_forward.w2.qzeros', 'model.layers.5.feed_forward.w1.qzeros', 'model.layers.7.feed_forward.w1.scales', 'model.layers.0.feed_forward.w2.qweight', 'model.layers.17.attention.wo.scales', 'model.layers.20.feed_forward.w2.qzeros', 'model.layers.20.attention.wqkv.scales', 'model.layers.18.feed_forward.w3.qzeros', 'model.layers.26.attention.wo.qzeros', 'model.layers.14.feed_forward.w2.qweight', 'model.layers.14.feed_forward.w1.qweight', 'model.layers.10.feed_forward.w2.scales', 'model.layers.10.attention.wo.scales', 'model.layers.9.attention.wo.scales', 'model.layers.0.attention.wqkv.qweight', 'model.layers.12.feed_forward.w3.qweight', 'model.layers.22.feed_forward.w3.qzeros', 'model.layers.9.feed_forward.w1.qweight', 'model.layers.8.feed_forward.w3.scales', 'model.layers.23.attention.wo.qweight', 'model.layers.22.attention.wqkv.qweight', 'model.layers.23.attention.wo.qzeros', 'model.layers.31.feed_forward.w2.scales', 'model.layers.5.feed_forward.w3.scales', 'model.layers.10.attention.wqkv.scales', 'model.layers.14.attention.wo.scales', 'model.layers.14.attention.wqkv.qzeros', 'model.layers.19.feed_forward.w3.qweight', 'model.layers.4.attention.wo.qweight', 'model.layers.4.feed_forward.w2.qzeros', 'model.layers.12.feed_forward.w1.qweight', 'model.layers.11.feed_forward.w1.scales', 'model.layers.4.feed_forward.w3.qzeros', 'model.layers.29.attention.wo.scales', 'model.layers.24.attention.wqkv.qweight', 'model.layers.4.feed_forward.w1.qzeros', 'model.layers.18.attention.wqkv.qzeros', 'model.layers.24.feed_forward.w3.scales', 'model.layers.18.attention.wqkv.qweight', 'model.layers.21.feed_forward.w3.qweight', 'model.layers.6.feed_forward.w3.qweight', 'model.layers.22.feed_forward.w1.qweight', 'model.layers.22.attention.wo.qweight', 'model.layers.21.attention.wqkv.qweight', 'model.layers.6.feed_forward.w2.scales', 'model.layers.24.feed_forward.w1.qzeros', 'model.layers.22.attention.wo.scales', 'model.layers.30.attention.wqkv.scales', 'model.layers.8.feed_forward.w1.qweight', 'model.layers.7.attention.wqkv.scales', 'model.layers.25.feed_forward.w2.scales', 'model.layers.24.attention.wo.qzeros', 'model.layers.15.feed_forward.w3.qweight', 'model.layers.24.feed_forward.w2.scales', 'model.layers.25.attention.wqkv.qweight', 'model.layers.4.attention.wqkv.qzeros', 'model.layers.1.feed_forward.w2.qzeros', 'model.layers.23.feed_forward.w3.scales', 'model.layers.13.feed_forward.w3.qzeros', 'model.layers.8.attention.wo.qweight', 'model.layers.0.feed_forward.w3.qweight', 'model.layers.29.feed_forward.w1.scales', 'model.layers.30.feed_forward.w3.scales', 'model.layers.4.attention.wqkv.scales', 'model.layers.27.feed_forward.w2.qzeros', 'model.layers.17.attention.wqkv.qzeros', 'model.layers.15.feed_forward.w2.qweight', 'model.layers.17.attention.wqkv.qweight', 'model.layers.11.attention.wo.qzeros', 'model.layers.20.feed_forward.w2.qweight', 'model.layers.28.attention.wo.qzeros', 'model.layers.10.attention.wqkv.qzeros', 'model.layers.18.attention.wqkv.scales', 'model.layers.7.attention.wqkv.qzeros', 'model.layers.2.attention.wqkv.qweight', 'model.layers.1.feed_forward.w3.qzeros', 'model.layers.31.attention.wqkv.qzeros', 'model.layers.0.feed_forward.w3.scales', 'model.layers.9.attention.wqkv.qzeros', 'model.layers.2.attention.wo.qzeros', 'model.layers.1.feed_forward.w2.qweight', 'model.layers.26.feed_forward.w2.scales', 'model.layers.18.feed_forward.w2.qweight', 'model.layers.1.attention.wo.qweight', 'model.layers.27.feed_forward.w1.qzeros', 'model.layers.30.attention.wqkv.qweight', 'model.layers.19.attention.wqkv.qzeros', 'model.layers.14.attention.wo.qweight', 'model.layers.13.feed_forward.w3.scales', 'model.layers.9.feed_forward.w3.scales', 'model.layers.22.feed_forward.w1.scales', 'model.layers.24.attention.wo.scales', 'model.layers.5.attention.wqkv.qweight', 'model.layers.6.attention.wo.qweight', 'model.layers.10.attention.wo.qzeros', 'model.layers.15.feed_forward.w1.scales', 'model.layers.0.feed_forward.w3.qzeros', 'model.layers.14.feed_forward.w3.scales', 'model.layers.12.feed_forward.w3.scales', 'model.layers.22.feed_forward.w2.scales', 'model.layers.8.feed_forward.w3.qweight', 'model.layers.31.feed_forward.w3.qweight', 'model.layers.28.feed_forward.w3.qzeros', 'model.layers.19.feed_forward.w1.qzeros', 'model.layers.31.feed_forward.w1.qweight', 'model.layers.14.attention.wqkv.qweight', 'model.layers.11.feed_forward.w3.scales', 'model.layers.12.attention.wo.qweight', 'model.layers.25.feed_forward.w1.qweight', 'model.layers.3.feed_forward.w1.scales', 'model.layers.4.feed_forward.w2.qweight', 'model.layers.27.feed_forward.w3.scales', 'model.layers.13.feed_forward.w1.scales', 'model.layers.2.feed_forward.w3.qweight', 'model.layers.6.attention.wqkv.qweight', 'model.layers.23.attention.wo.scales', 'model.layers.13.feed_forward.w2.qzeros', 'model.layers.31.attention.wqkv.qweight', 'model.layers.9.feed_forward.w3.qzeros', 'model.layers.6.feed_forward.w1.qweight', 'model.layers.5.feed_forward.w2.qzeros', 'model.layers.3.feed_forward.w3.scales', 'model.layers.13.feed_forward.w2.qweight', 'model.layers.1.attention.wo.scales', 'model.layers.3.feed_forward.w3.qzeros', 'model.layers.7.feed_forward.w3.scales', 'model.layers.30.feed_forward.w1.qzeros', 'model.layers.27.feed_forward.w3.qweight', 'model.layers.21.feed_forward.w3.scales', 'model.layers.6.feed_forward.w1.scales', 'model.layers.16.feed_forward.w3.qzeros', 'model.layers.4.feed_forward.w2.scales', 'model.layers.28.attention.wqkv.qzeros', 'model.layers.25.attention.wo.scales', 'model.layers.11.feed_forward.w2.scales', 'model.layers.29.feed_forward.w1.qzeros', 'model.layers.11.feed_forward.w2.qzeros', 'model.layers.21.attention.wo.qzeros', 'model.layers.0.attention.wo.scales', 'model.layers.30.feed_forward.w2.qweight', 'model.layers.27.attention.wqkv.qweight', 'model.layers.9.feed_forward.w1.scales', 'model.layers.2.feed_forward.w3.scales', 'model.layers.24.feed_forward.w2.qzeros', 'model.layers.20.attention.wqkv.qweight', 'model.layers.27.feed_forward.w1.scales', 'model.layers.18.attention.wo.qweight', 'model.layers.10.feed_forward.w3.qzeros', 'model.layers.17.attention.wo.qweight', 'model.layers.3.attention.wqkv.qzeros', 'model.layers.14.feed_forward.w1.scales', 'model.layers.12.feed_forward.w2.qweight', 'model.layers.11.feed_forward.w1.qzeros', 'model.layers.18.feed_forward.w1.qzeros', 'model.layers.8.attention.wo.scales', 'model.layers.17.feed_forward.w2.qweight', 'model.layers.27.attention.wqkv.scales', 'model.layers.12.feed_forward.w2.scales', 'model.layers.0.feed_forward.w1.qweight', 'model.layers.9.attention.wo.qweight', 'model.layers.3.feed_forward.w2.scales', 'model.layers.0.feed_forward.w2.scales', 'model.layers.2.attention.wo.qweight', 'model.layers.8.feed_forward.w1.scales', 'model.layers.21.feed_forward.w1.qzeros', 'model.layers.14.attention.wqkv.scales', 'model.layers.4.feed_forward.w1.qweight', 'model.layers.30.attention.wo.qzeros', 'model.layers.1.feed_forward.w3.qweight', 'model.layers.18.feed_forward.w3.qweight', 'model.layers.9.feed_forward.w2.scales', 'model.layers.31.feed_forward.w2.qzeros', 'model.layers.16.attention.wqkv.scales', 'model.layers.25.attention.wqkv.scales', 'model.layers.9.feed_forward.w3.qweight', 'model.layers.2.feed_forward.w2.qzeros', 'model.layers.23.feed_forward.w2.scales', 'model.layers.12.attention.wqkv.qweight', 'model.layers.16.attention.wo.scales', 'model.layers.7.feed_forward.w1.qzeros', 'model.layers.27.attention.wqkv.qzeros', 'model.layers.19.feed_forward.w2.qweight', 'model.layers.20.attention.wo.qzeros', 'model.layers.26.feed_forward.w3.qzeros', 'model.layers.3.attention.wqkv.scales', 'model.layers.23.feed_forward.w3.qzeros', 'model.layers.7.attention.wo.qweight', 'model.layers.14.feed_forward.w2.scales', 'model.layers.1.attention.wo.qzeros', 'model.layers.20.attention.wqkv.qzeros', 'model.layers.7.feed_forward.w2.scales', 'model.layers.0.attention.wo.qzeros', 'model.layers.28.feed_forward.w3.qweight', 'model.layers.7.attention.wo.scales', 'model.layers.23.feed_forward.w1.qweight', 'model.layers.21.attention.wqkv.scales', 'model.layers.5.attention.wo.qzeros', 'model.layers.12.feed_forward.w1.qzeros', 'model.layers.24.feed_forward.w3.qweight', 'model.layers.17.feed_forward.w1.qzeros', 'model.layers.26.attention.wqkv.scales', 'model.layers.8.attention.wqkv.qzeros', 'model.layers.8.attention.wqkv.qweight', 'model.layers.6.feed_forward.w3.qzeros', 'model.layers.26.feed_forward.w3.scales', 'model.layers.14.feed_forward.w3.qzeros', 'model.layers.28.attention.wo.scales', 'model.layers.15.attention.wqkv.scales', 'model.layers.19.feed_forward.w3.scales', 'model.layers.17.feed_forward.w1.qweight', 'model.layers.25.attention.wo.qzeros', 'model.layers.16.attention.wqkv.qzeros', 'model.layers.16.feed_forward.w1.qzeros', 'model.layers.30.feed_forward.w1.qweight', 'model.layers.16.attention.wo.qweight', 'model.layers.2.attention.wqkv.qzeros', 'model.layers.23.feed_forward.w1.scales', 'model.layers.10.feed_forward.w1.qweight', 'model.layers.29.feed_forward.w2.qzeros', 'model.layers.6.attention.wqkv.qzeros', 'model.layers.15.feed_forward.w1.qweight', 'model.layers.0.feed_forward.w2.qzeros', 'model.layers.27.feed_forward.w3.qzeros', 'model.layers.31.attention.wo.qzeros', 'model.layers.2.attention.wo.scales', 'model.layers.17.feed_forward.w3.qzeros', 'model.layers.19.feed_forward.w2.scales', 'model.layers.18.attention.wo.qzeros', 'model.layers.12.attention.wo.scales', 'model.layers.27.attention.wo.qweight', 'model.layers.30.attention.wo.scales', 'model.layers.20.feed_forward.w3.qzeros', 'model.layers.26.feed_forward.w1.scales', 'model.layers.30.attention.wqkv.qzeros', 'model.layers.30.feed_forward.w2.scales', 'model.layers.29.feed_forward.w3.scales', 'model.layers.8.feed_forward.w1.qzeros', 'model.layers.0.attention.wqkv.scales', 'model.layers.27.feed_forward.w2.qweight', 'model.layers.21.feed_forward.w2.qzeros', 'model.layers.23.attention.wqkv.qweight', 'model.layers.7.feed_forward.w3.qzeros', 'model.layers.2.feed_forward.w1.qweight', 'model.layers.20.feed_forward.w3.scales', 'model.layers.15.attention.wo.qzeros', 'model.layers.5.attention.wqkv.scales', 'model.layers.10.feed_forward.w2.qweight', 'model.layers.22.attention.wqkv.qzeros', 'model.layers.13.feed_forward.w2.scales', 'model.layers.15.feed_forward.w3.qzeros', 'model.layers.5.feed_forward.w3.qzeros', 'model.layers.28.feed_forward.w1.qweight', 'model.layers.11.feed_forward.w1.qweight', 'model.layers.11.feed_forward.w2.qweight', 'model.layers.21.feed_forward.w2.scales', 'model.layers.1.attention.wqkv.qweight', 'model.layers.5.feed_forward.w3.qweight', 'model.layers.25.feed_forward.w3.scales', 'model.layers.12.feed_forward.w2.qzeros', 'model.layers.28.feed_forward.w2.qweight', 'model.layers.20.attention.wo.scales', 'model.layers.13.attention.wqkv.scales', 'model.layers.13.attention.wqkv.qweight', 'model.layers.18.feed_forward.w1.scales', 'model.layers.28.feed_forward.w2.scales', 'model.layers.19.feed_forward.w2.qzeros', 'model.layers.31.attention.wqkv.scales', 'model.layers.5.feed_forward.w2.qweight', 'model.layers.15.attention.wqkv.qweight', 'model.layers.13.attention.wqkv.qzeros', 'model.layers.16.feed_forward.w2.scales', 'model.layers.25.feed_forward.w2.qweight', 'model.layers.15.feed_forward.w1.qzeros', 'model.layers.16.feed_forward.w2.qweight', 'model.layers.9.feed_forward.w2.qweight', 'model.layers.17.feed_forward.w1.scales', 'model.layers.25.feed_forward.w2.qzeros', 'model.layers.26.attention.wqkv.qweight', 'model.layers.9.attention.wqkv.qweight', 'model.layers.9.attention.wqkv.scales', 'model.layers.26.feed_forward.w2.qzeros', 'model.layers.31.feed_forward.w1.qzeros', 'model.layers.16.attention.wqkv.qweight', 'model.layers.4.feed_forward.w1.scales', 'model.layers.6.attention.wqkv.scales', 'model.layers.19.attention.wqkv.qweight', 'model.layers.10.feed_forward.w3.qweight', 'model.layers.2.feed_forward.w1.qzeros']
- This IS expected if you are initializing InternLMXComposer2ForCausalLM from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing InternLMXComposer2ForCausalLM from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of InternLMXComposer2ForCausalLM were not initialized from the model checkpoint at agentsea/internlm-xcomposer2-4khd-7b-4bit and are newly initialized: ['model.layers.23.attention.wo.weight', 'model.layers.31.attention.wqkv.weight', 'model.layers.24.feed_forward.w3.weight', 'model.layers.7.feed_forward.w3.weight', 'model.layers.18.attention.wqkv.weight', 'model.layers.19.feed_forward.w2.weight', 'model.layers.30.attention.wo.weight', 'model.layers.11.attention.wo.weight', 'model.layers.3.attention.wqkv.weight', 'model.layers.25.feed_forward.w2.weight', 'model.layers.9.attention.wo.weight', 'model.layers.10.feed_forward.w3.weight', 'model.layers.0.feed_forward.w3.weight', 'model.layers.28.feed_forward.w2.weight', 'model.layers.19.feed_forward.w1.weight', 'model.layers.12.feed_forward.w2.weight', 'model.layers.31.feed_forward.w2.weight', 'model.layers.0.attention.wqkv.weight', 'model.layers.31.feed_forward.w3.weight', 'model.layers.22.feed_forward.w1.weight', 'model.layers.4.feed_forward.w3.weight', 'model.layers.2.feed_forward.w3.weight', 'model.layers.3.attention.wo.weight', 'model.layers.29.feed_forward.w1.weight', 'model.layers.28.attention.wo.weight', 'model.layers.31.attention.wo.weight', 'model.layers.23.feed_forward.w2.weight', 'model.layers.2.feed_forward.w1.weight', 'model.layers.14.feed_forward.w2.weight', 'model.layers.20.feed_forward.w2.weight', 'model.layers.12.attention.wo.weight', 'model.layers.10.attention.wo.weight', 'model.layers.1.feed_forward.w2.weight', 'model.layers.25.attention.wqkv.weight', 'model.layers.17.attention.wqkv.weight', 'model.layers.0.feed_forward.w1.weight', 'model.layers.10.feed_forward.w1.weight', 'model.layers.14.feed_forward.w3.weight', 'model.layers.30.feed_forward.w3.weight', 'model.layers.19.attention.wqkv.weight', 'model.layers.16.attention.wo.weight', 'model.layers.6.feed_forward.w3.weight', 'model.layers.26.feed_forward.w1.weight', 'model.layers.15.attention.wqkv.weight', 'model.layers.21.feed_forward.w1.weight', 'model.layers.13.attention.wqkv.weight', 'model.layers.22.attention.wo.weight', 'model.layers.11.feed_forward.w2.weight', 'model.layers.11.feed_forward.w3.weight', 'model.layers.9.attention.wqkv.weight', 'model.layers.0.attention.wo.weight', 'model.layers.16.feed_forward.w3.weight', 'model.layers.18.feed_forward.w1.weight', 'model.layers.11.feed_forward.w1.weight', 'model.layers.12.feed_forward.w3.weight', 'model.layers.17.feed_forward.w2.weight', 'model.layers.1.attention.wqkv.weight', 'model.layers.6.attention.wqkv.weight', 'model.layers.22.attention.wqkv.weight', 'model.layers.16.feed_forward.w2.weight', 'model.layers.20.attention.wo.weight', 'model.layers.5.attention.wo.weight', 'model.layers.18.feed_forward.w2.weight', 'model.layers.17.feed_forward.w3.weight', 'model.layers.29.feed_forward.w2.weight', 'model.layers.26.feed_forward.w2.weight', 'model.layers.28.feed_forward.w3.weight', 'model.layers.21.attention.wqkv.weight', 'model.layers.14.attention.wqkv.weight', 'model.layers.5.attention.wqkv.weight', 'model.layers.6.feed_forward.w2.weight', 'model.layers.22.feed_forward.w2.weight', 'model.layers.2.feed_forward.w2.weight', 'model.layers.3.feed_forward.w1.weight', 'model.layers.31.feed_forward.w1.weight', 'model.layers.16.feed_forward.w1.weight', 'model.layers.21.attention.wo.weight', 'model.layers.4.attention.wo.weight', 'model.layers.26.feed_forward.w3.weight', 'model.layers.9.feed_forward.w1.weight', 'model.layers.29.attention.wo.weight', 'model.layers.29.attention.wqkv.weight', 'model.layers.9.feed_forward.w2.weight', 'model.layers.27.feed_forward.w2.weight', 'model.layers.28.feed_forward.w1.weight', 'model.layers.26.attention.wo.weight', 'model.layers.8.attention.wo.weight', 'model.layers.12.attention.wqkv.weight', 'model.layers.8.feed_forward.w3.weight', 'model.layers.10.feed_forward.w2.weight', 'model.layers.2.attention.wqkv.weight', 'model.layers.4.feed_forward.w2.weight', 'model.layers.24.feed_forward.w2.weight', 'model.layers.15.feed_forward.w1.weight', 'model.layers.27.attention.wo.weight', 'model.layers.3.feed_forward.w2.weight', 'model.layers.16.attention.wqkv.weight', 'model.layers.1.feed_forward.w1.weight', 'model.layers.8.feed_forward.w1.weight', 'model.layers.20.feed_forward.w1.weight', 'model.layers.25.feed_forward.w1.weight', 'model.layers.7.attention.wqkv.weight', 'model.layers.20.feed_forward.w3.weight', 'model.layers.13.feed_forward.w2.weight', 'model.layers.7.attention.wo.weight', 'model.layers.23.feed_forward.w3.weight', 'model.layers.18.attention.wo.weight', 'model.layers.4.feed_forward.w1.weight', 'model.layers.23.attention.wqkv.weight', 'model.layers.5.feed_forward.w1.weight', 'model.layers.19.attention.wo.weight', 'model.layers.4.attention.wqkv.weight', 'model.layers.13.attention.wo.weight', 'model.layers.13.feed_forward.w1.weight', 'model.layers.24.attention.wqkv.weight', 'model.layers.27.feed_forward.w1.weight', 'model.layers.29.feed_forward.w3.weight', 'model.layers.9.feed_forward.w3.weight', 'model.layers.14.feed_forward.w1.weight', 'model.layers.6.feed_forward.w1.weight', 'model.layers.2.attention.wo.weight', 'model.layers.30.feed_forward.w2.weight', 'model.layers.21.feed_forward.w3.weight', 'model.layers.1.attention.wo.weight', 'model.layers.14.attention.wo.weight', 'model.layers.7.feed_forward.w2.weight', 'model.layers.17.attention.wo.weight', 'model.layers.8.feed_forward.w2.weight', 'model.layers.11.attention.wqkv.weight', 'model.layers.3.feed_forward.w3.weight', 'model.layers.5.feed_forward.w2.weight', 'model.layers.10.attention.wqkv.weight', 'model.layers.21.feed_forward.w2.weight', 'model.layers.19.feed_forward.w3.weight', 'model.layers.30.feed_forward.w1.weight', 'model.layers.25.attention.wo.weight', 'model.layers.26.attention.wqkv.weight', 'model.layers.15.attention.wo.weight', 'model.layers.27.attention.wqkv.weight', 'model.layers.18.feed_forward.w3.weight', 'model.layers.28.attention.wqkv.weight', 'model.layers.7.feed_forward.w1.weight', 'model.layers.5.feed_forward.w3.weight', 'model.layers.23.feed_forward.w1.weight', 'model.layers.1.feed_forward.w3.weight', 'model.layers.24.attention.wo.weight', 'model.layers.8.attention.wqkv.weight', 'model.layers.15.feed_forward.w3.weight', 'model.layers.20.attention.wqkv.weight', 'model.layers.27.feed_forward.w3.weight', 'model.layers.17.feed_forward.w1.weight', 'model.layers.12.feed_forward.w1.weight', 'model.layers.24.feed_forward.w1.weight', 'model.layers.15.feed_forward.w2.weight', 'model.layers.6.attention.wo.weight', 'model.layers.30.attention.wqkv.weight', 'model.layers.25.feed_forward.w3.weight', 'model.layers.22.feed_forward.w3.weight', 'model.layers.13.feed_forward.w3.weight', 'model.layers.0.feed_forward.w2.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:02<00:00,  1.53it/s]
Some weights of the model checkpoint at agentsea/internlm-xcomposer2-4khd-7b-4bit were not used when initializing InternLMXComposer2ForCausalLM: ['model.layers.3.feed_forward.w1.scales', 'model.layers.21.attention.wqkv.scales', 'model.layers.14.feed_forward.w2.qzeros', 'model.layers.30.feed_forward.w1.scales', 'model.layers.0.feed_forward.w1.qzeros', 'model.layers.2.attention.wqkv.scales', 'model.layers.17.attention.wo.qweight', 'model.layers.15.attention.wo.qzeros', 'model.layers.15.attention.wqkv.scales', 'model.layers.21.feed_forward.w1.qzeros', 'model.layers.29.attention.wqkv.qzeros', 
*************************
ers.23.attention.wqkv.weight', 'model.layers.27.feed_forward.w3.weight', 'model.layers.8.attention.wqkv.weight', 'model.layers.15.attention.wo.weight', 'model.layers.25.feed_forward.w2.weight', 'model.layers.18.feed_forward.w1.weight', 'model.layers.19.attention.wqkv.weight', 'model.layers.21.feed_forward.w3.weight', 'model.layers.22.attention.wqkv.weight', 'model.layers.30.attention.wqkv.weight', 'model.layers.27.attention.wqkv.weight', 'model.layers.24.attention.wo.weight', 'model.layers.29.attention.wo.weight', 'model.layers.20.feed_forward.w3.weight', 'model.layers.6.feed_forward.w3.weight', 'model.layers.1.feed_forward.w3.weight', 'model.layers.29.feed_forward.w3.weight', 'model.layers.19.attention.wo.weight', 'model.layers.25.feed_forward.w1.weight', 'model.layers.17.feed_forward.w1.weight', 'model.layers.21.attention.wo.weight', 'model.layers.27.feed_forward.w2.weight', 'model.layers.30.feed_forward.w3.weight', 'model.layers.27.attention.wo.weight', 'model.layers.6.feed_forward.w1.weight', 'model.layers.13.feed_forward.w3.weight', 'model.layers.16.feed_forward.w1.weight', 'model.layers.4.attention.wo.weight', 'model.layers.29.attention.wqkv.weight', 'model.layers.17.attention.wo.weight', 'model.layers.13.attention.wo.weight', 'model.layers.10.feed_forward.w2.weight', 'model.layers.9.feed_forward.w3.weight', 'model.layers.24.feed_forward.w1.weight', 'model.layers.8.feed_forward.w2.weight', 'model.layers.4.feed_forward.w2.weight', 'model.layers.23.feed_forward.w3.weight', 'model.layers.2.feed_forward.w1.weight', 'model.layers.0.feed_forward.w2.weight', 'model.layers.3.attention.wqkv.weight', 'model.layers.7.attention.wo.weight', 'model.layers.18.feed_forward.w3.weight', 'model.layers.4.feed_forward.w1.weight', 'model.layers.15.feed_forward.w2.weight', 'model.layers.17.feed_forward.w2.weight', 'model.layers.3.feed_forward.w1.weight', 'model.layers.25.feed_forward.w3.weight', 'model.layers.26.feed_forward.w3.weight', 'model.layers.5.attention.wo.weight', 'model.layers.14.feed_forward.w1.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
trainable params: 151,003,136 || all params: 8,748,496,896 || trainable%: 1.7260466317252952
init mix data at rank 3
load 10 data
5samples is loaded
True
/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/accelerate/accelerator.py:451: FutureWarning: Passing the following arguments to `Accelerator` is deprecated and will be removed in version 1.0 of Accelerate: dict_keys(['dispatch_batches']). Please pass an `accelerate.DataLoaderConfiguration` instead: 
dataloader_config = DataLoaderConfiguration(dispatch_batches=None)
  warnings.warn(
Traceback (most recent call last):
  File "/home/ubuntu/InternLM-XComposer/finetune/finetune.py", line 310, in <module>
    train()
  File "/home/ubuntu/InternLM-XComposer/finetune/finetune.py", line 297, in train
    trainer = Trainer(
  File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/trainer.py", line 409, in __init__
    raise ValueError(
ValueError: The model you want to train is loaded in 8-bit precision.  if you want to fine-tune an 8-bit model, please make sure that you have installed `bitsandbytes>=0.37.0`. 
trainable params: 151,003,136 || all params: 8,748,496,896 || trainable%: 1.7260466317252952
Loading data...
Load 10 samples from ['data/single_turn_single_image_example.json', '0.01']
init mix data at rank 0
load 10 data
5samples is loaded
True
/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/accelerate/accelerator.py:451: FutureWarning: Passing the following arguments to `Accelerator` is deprecated and will be removed in version 1.0 of Accelerate: dict_keys(['dispatch_batches']). Please pass an `accelerate.DataLoaderConfiguration` instead: 
dataloader_config = DataLoaderConfiguration(dispatch_batches=None)
  warnings.warn(
Traceback (most recent call last):
  File "/home/ubuntu/InternLM-XComposer/finetune/finetune.py", line 310, in <module>
    train()
  File "/home/ubuntu/InternLM-XComposer/finetune/finetune.py", line 297, in train
    trainer = Trainer(
  File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/trainer.py", line 409, in __init__
    raise ValueError(
ValueError: The model you want to train is loaded in 8-bit precision.  if you want to fine-tune an 8-bit model, please make sure that you have installed `bitsandbytes>=0.37.0`. 
trainable params: 151,003,136 || all params: 8,748,496,896 || trainable%: 1.7260466317252952
init mix data at rank 2
load 10 data
5samples is loaded
True
/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/accelerate/accelerator.py:451: FutureWarning: Passing the following arguments to `Accelerator` is deprecated and will be removed in version 1.0 of Accelerate: dict_keys(['dispatch_batches']). Please pass an `accelerate.DataLoaderConfiguration` instead: 
dataloader_config = DataLoaderConfiguration(dispatch_batches=None)
  warnings.warn(
Traceback (most recent call last):
  File "/home/ubuntu/InternLM-XComposer/finetune/finetune.py", line 310, in <module>
    train()
  File "/home/ubuntu/InternLM-XComposer/finetune/finetune.py", line 297, in train
    trainer = Trainer(
  File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/trainer.py", line 409, in __init__
    raise ValueError(
ValueError: The model you want to train is loaded in 8-bit precision.  if you want to fine-tune an 8-bit model, please make sure that you have installed `bitsandbytes>=0.37.0`. 
trainable params: 151,003,136 || all params: 8,748,496,896 || trainable%: 1.7260466317252952
init mix data at rank 1
load 10 data
5samples is loaded
True
/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/accelerate/accelerator.py:451: FutureWarning: Passing the following arguments to `Accelerator` is deprecated and will be removed in version 1.0 of Accelerate: dict_keys(['dispatch_batches']). Please pass an `accelerate.DataLoaderConfiguration` instead: 
dataloader_config = DataLoaderConfiguration(dispatch_batches=None)
  warnings.warn(
Traceback (most recent call last):
  File "/home/ubuntu/InternLM-XComposer/finetune/finetune.py", line 310, in <module>
    train()
  File "/home/ubuntu/InternLM-XComposer/finetune/finetune.py", line 297, in train
    trainer = Trainer(
  File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/trainer.py", line 409, in __init__
    raise ValueError(
ValueError: The model you want to train is loaded in 8-bit precision.  if you want to fine-tune an 8-bit model, please make sure that you have installed `bitsandbytes>=0.37.0`. 
[2024-07-25 17:00:45,016] torch.distributed.elastic.multiprocessing.api: [WARNING] Sending process 10096 closing signal SIGTERM
[2024-07-25 17:00:45,331] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 10095) of binary: /home/ubuntu/miniconda3/envs/intern_clean/bin/python
Traceback (most recent call last):
  File "/home/ubuntu/miniconda3/envs/intern_clean/bin/torchrun", line 8, in <module>
    sys.exit(main())
  File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 347, in wrapper
    return f(*args, **kwargs)
  File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/torch/distributed/run.py", line 812, in main
    run(args)
  File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/torch/distributed/run.py", line 803, in run
    elastic_launch(
  File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 135, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/torch/distributed/launcher/api.py", line 268, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
finetune.py FAILED
------------------------------------------------------------
Failures:
[1]:
  time      : 2024-07-25_17:00:45
  host      : ip-172-31-18-91.ec2.internal
  rank      : 2 (local_rank: 2)
  exitcode  : 1 (pid: 10097)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[2]:
  time      : 2024-07-25_17:00:45
  host      : ip-172-31-18-91.ec2.internal
  rank      : 3 (local_rank: 3)
  exitcode  : 1 (pid: 10098)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2024-07-25_17:00:45
  host      : ip-172-31-18-91.ec2.internal
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 10095)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

Also I am using the finetune.py without any changes.

Screenshot 2024-07-25 at 19 03 49

yuhangzang commented 3 months ago

Do u use the fine-tune code in IXC 2.0? It is different with the IXC 2.5 finetune code.

zhuraromdev commented 3 months ago

Hey, yeap, I am using the code form here: https://github.com/InternLM/InternLM-XComposer/blob/main/finetune/finetune.py

zhuraromdev commented 3 months ago

@yuhangzang I have tried to run the code with 2.0 version. However still getting the same error:

Traceback (most recent call last):
  File "/home/ubuntu/InternLM-XComposer/InternLM-XComposer-2.0/finetune/finetune.py", line 318, in <module>
    train()
  File "/home/ubuntu/InternLM-XComposer/InternLM-XComposer-2.0/finetune/finetune.py", line 305, in train
    trainer = Trainer(
  File "/home/ubuntu/miniconda3/envs/intern_clean/lib/python3.9/site-packages/transformers/trainer.py", line 409, in __init__
    raise ValueError(
ValueError: The model you want to train is loaded in 8-bit precision.  if you want to fine-tune an 8-bit model, please make sure that you have installed `bitsandbytes>=0.37.0`. 

Let me know, which additional information is needed, thank you!