[Badcase]: GPTQ on modelscope some weight not initialized

Model Series

Qwen2.5

What are the models used?

qwen/Qwen2.5-14B-Instruct-GPTQ-Int4

What is the scenario where the problem happened?

Some weights of Qwen2ForCausalLM were not initialized from the model checkpoint at modelposition

Is this badcase known and can it be solved using avaiable techniques?

[X] I have followed the GitHub README.
[X] I have checked the Qwen documentation and cannot find a solution there.
[X] I have checked the documentation of the related framework and cannot find useful information.
[X] I have searched the issues and there is not a similar one.

Information about environment

There is no such problem if I change model to qwen/Qwen1.5-14B-Chat-GPTQ-Int4

OS: ubuntu20.04 Python：3.10.9 NVIDIA driver: 560.35.02 cuda_11.7 pytorch: 2.0.1+cu117 requirements.txt: absl-py==1.4.0 accelerate==0.30.1 addict==2.4.0 aiofiles==23.1.0 aiohttp==3.8.4 aiosignal==1.3.1 aliyun-python-sdk-core==2.13.36 aliyun-python-sdk-kms==2.16.1 altair==4.2.2 anyio==3.6.2 appdirs==1.4.4 argilla==1.6.0 argon2-cffi==21.3.0 argon2-cffi-bindings==21.2.0 arrow==1.2.3 asttokens==2.2.1 async-timeout==4.0.2 attrs==23.1.0 backcall==0.2.0 backoff==2.2.1 beautifulsoup4==4.12.2 bitsandbytes==0.43.1 bleach==6.0.0 bottle==0.12.25 cachetools==5.3.0 certifi==2022.12.7 cffi==1.15.1 chardet==5.1.0 charset-normalizer==3.1.0 click==8.1.3 cmake==3.26.1 coloredlogs==15.0.1 comm==0.1.3 commonmark==0.9.1 contourpy==1.0.7 cpm-kernels==1.0.11 crcmod==1.7 cryptography==40.0.2 cycler==0.11.0 dataclasses-json==0.5.7 datasets==2.18.0 DBUtils==3.0.3 debugpy==1.6.7 decorator==5.1.1 deepspeed defusedxml==0.7.1 Deprecated==1.2.13 dill==0.3.6 docker-pycreds==0.4.0 docx2txt==0.8 einops==0.6.1 entrypoints==0.4 environs==9.5.0 et-xmlfile==1.1.0 executing==1.2.0 faiss-cpu==1.7.4 fastapi==0.95.1 fastjsonschema==2.16.3 ffmpy==0.3.0 filelock==3.11.0 flash-attn @ file:///mnt/c/Users/huash/Downloads/flash-attention-main%20%281%29/flash-attention-main/flash_attn-2.5.7%2Bcu118torch2.0cxx11abiFALSE-cp310-cp310-linux_x86_64.whl#sha256=d2fde085802c694d67ab4c7573e38da54539dfdaab6b809b3e107f18c39ac3f0 flexgen==0.1.7 fonttools==4.39.3 fqdn==1.5.1 frozenlist==1.3.3 fsspec==2024.2.0 gast==0.5.4 gekko==1.1.1 gitdb==4.0.10 GitPython==3.1.31 google-auth==2.17.3 google-auth-oauthlib==1.0.0 gradio==3.23.0 gradio_client==0.1.3 graypy==2.1.0 greenlet==2.0.2 grpcio grpcio-tools==1.53.0 h11==0.14.0 hjson==3.1.0 httpcore==0.16.3 httpx==0.23.3 huggingface-hub==0.23.0 humanfriendly==10.0 icetk==0.0.4 idna==3.4 importlib-metadata==6.8.0 InstructorEmbedding==1.0.0 ipykernel==6.22.0 ipython==8.12.0 ipython-genutils==0.2.0 ipywidgets==8.0.6 isoduration==20.11.0 jedi==0.18.2 jieba==0.42.1 Jinja2==3.1.2 jmespath==0.10.0 joblib==1.2.0 jsonpointer==2.3 jsonschema==4.17.3 jupyter-events==0.6.3 jupyter_client==8.2.0 jupyter_core==5.3.0 jupyter_server==2.5.0 jupyter_server_terminals==0.4.4 jupyterlab-pygments==0.2.2 jupyterlab-widgets==3.0.7 kiwisolver==1.4.4 langchain==0.0.150 linkify-it-py==2.0.0 lit==16.0.0 llama-cpp-python==0.1.48 lxml==4.9.2 Markdown==3.4.3 markdown-it-py==2.2.0 markdown-to-json==2.1.0 markdown2==2.4.8 MarkupSafe==2.1.2 marshmallow==3.19.0 marshmallow-enum==1.5.1 matplotlib==3.7.1 matplotlib-inline==0.1.6 mdit-py-plugins==0.3.3 mdurl==0.1.2 mistune==2.0.5 modelscope==1.14.0 monotonic==1.6 mpmath==1.3.0 msg-parser==1.2.0 multidict==6.0.4 multiprocess==0.70.14 mypy-extensions==1.0.0 nbclassic==0.5.5 nbclient==0.7.3 nbconvert==7.3.1 nbformat==5.8.0 nest-asyncio==1.5.6 networkx==3.0 nh3==0.2.11 ninja==1.11.1.1 nltk==3.8.1 notebook==6.5.4 notebook_shim==0.2.2 numexpr==2.8.4 numpy==1.22.0 nvidia-cublas-cu11==11.10.3.66 nvidia-cuda-cupti-cu11==11.7.101 nvidia-cuda-nvrtc-cu11==11.7.99 nvidia-cuda-runtime-cu11==11.7.99 nvidia-cudnn-cu11==8.5.0.96 nvidia-cufft-cu11==10.9.0.58 nvidia-curand-cu11==10.2.10.91 nvidia-cusolver-cu11==11.4.0.1 nvidia-cusparse-cu11==11.7.4.91 nvidia-nccl-cu11==2.14.3 nvidia-nvtx-cu11==11.7.91 oauthlib==3.2.2 olefile==0.46 openapi-schema-pydantic==1.2.4 openpyxl==3.1.2 optimum==1.19.2 orjson==3.8.10 oss2==2.18.1 packaging==23.1 pandas==1.3.5 pandocfilters==1.5.0 parso==0.8.3 Paste==3.5.2 pathtools==0.1.2 pdfkit==1.0.0 pdfminer.six==20221105 pdfplumber==0.9.0 peft==0.10.0 pexpect==4.8.0 pickleshare==0.7.5 Pillow==9.5.0 platformdirs==3.10.0 prometheus-client==0.16.0 prompt-toolkit==3.0.38 protobuf psutil==5.9.4 ptyprocess==0.7.0 PuLP==2.7.0 pure-eval==0.2.2 py-cpuinfo==9.0.0 pyarrow==16.0.0 pyarrow-hotfix==0.6 pyasn1==0.5.0 pyasn1-modules==0.3.0 pycparser==2.21 pycryptodome==3.17 pydantic==1.10.7 pydub==0.25.1 Pygments==2.15.1 pymilvus==2.2.8 PyMuPDF==1.22.3 PyMySQL==1.1.0 pyodbc==4.0.39 pypandoc==1.11 pyparsing==3.0.9 pyre-extensions==0.0.29 pyrsistent==0.19.3 python-dateutil==2.8.2 python-docx==0.8.11 python-dotenv==1.0.0 python-json-logger==2.0.7 python-magic==0.4.27 python-multipart==0.0.6 python-pptx==0.6.21 pytz==2023.3 PyYAML==6.0 pyzmq==25.0.2 quant-cuda==0.0.0 redis==4.6.0 regex==2023.3.23 requests==2.28.2 requests-oauthlib==1.3.1 responses==0.18.0 rfc3339-validator==0.1.4 rfc3986==1.5.0 rfc3986-validator==0.1.1 rich==13.0.1 rouge==1.0.1 rsa==4.9 rwkv==0.7.3 safetensors==0.4.3 scikit-learn==1.2.2 scipy==1.10.1 semantic-version==2.10.0 Send2Trash==1.8.0 sentence-transformers==2.2.2 sentencepiece==0.1.97 sentry-sdk==1.22.2 setproctitle==1.3.2 shortuuid==1.0.11 simplejson==3.19.1 six==1.16.0 smmap==5.0.0 sniffio==1.3.0 sortedcontainers==2.4.0 soupsieve==2.4.1 SQLAlchemy==2.0.10 SQLAlchemy-Utils==0.41.0 stack-data==0.6.2 starlette==0.26.1 style==1.1.0 svgwrite==1.4.3 sympy==1.11.1 tenacity==8.2.2 tensorboard==2.12.2 tensorboard-data-server==0.7.0 tensorboard-plugin-wit==1.8.1 terminado==0.17.1 threadpoolctl==3.1.0 tiktoken==0.3.3 tinycss2==1.2.1 tokenizers==0.19.1 tomli==2.0.1 toolz==0.12.0 torch==2.0.1 torchaudio torchvision tornado==6.3.1 tqdm==4.65.0 traitlets==5.9.0 transformers==4.40.2 transformers-stream-generator==0.0.4 triton==2.0.0 typing-inspect==0.8.0 typing_extensions==4.5.0 tzdata==2023.3 uc-micro-py==1.0.1 ujson==5.7.0 unstructured==0.6.2 update==0.0.1 uplink==0.9.7 uri-template==1.2.0 uritemplate==4.1.1 urllib3==1.26.15 uvicorn==0.21.1 Wand==0.6.11 wandb==0.15.2 wavedrom==2.0.3.post3 wcwidth==0.2.6 webcolors==1.13 webencodings==0.5.1 websocket-client==1.5.1 websockets==11.0.1 Werkzeug==2.2.3 widgetsnbextension==4.0.7 wrapt==1.14.1 xformers==0.0.20 XlsxWriter==3.1.0 xxhash==3.2.0 yapf==0.40.1 yarl==1.8.2 zipp==3.16.2

Description

the output is as follows:

Loading checkpoint shards: 100%|███████████████████████████████████████████████████| 3/3 [00:07<00:00, 2.57s/it] Some weights of Qwen2ForCausalLM were not initialized from the model checkpoint at /home/wsl/.cache/modelscope/hub/qwen/Qwen2___5-14B-Instruct-GPTQ-Int4 and are newly initialized: ['model.layers.0.mlp.down_proj.bias', 'model.layers.0.mlp.gate_proj.bias', 'model.layers.0.mlp.up_proj.bias', 'model.layers.0.self_attn.o_proj.bias', 'model.layers.1.mlp.down_proj.bias', 'model.layers.1.mlp.gate_proj.bias', 'model.layers.1.mlp.up_proj.bias', 'model.layers.1.self_attn.o_proj.bias', 'model.layers.10.mlp.down_proj.bias', 'model.layers.10.mlp.gate_proj.bias', 'model.layers.10.mlp.up_proj.bias', 'model.layers.10.self_attn.o_proj.bias', 'model.layers.11.mlp.down_proj.bias', 'model.layers.11.mlp.gate_proj.bias', 'model.layers.11.mlp.up_proj.bias', 'model.layers.11.self_attn.o_proj.bias', 'model.layers.12.mlp.down_proj.bias', 'model.layers.12.mlp.gate_proj.bias', 'model.layers.12.mlp.up_proj.bias', 'model.layers.12.self_attn.o_proj.bias', 'model.layers.13.mlp.down_proj.bias', 'model.layers.13.mlp.gate_proj.bias', 'model.layers.13.mlp.up_proj.bias', 'model.layers.13.self_attn.o_proj.bias', 'model.layers.14.mlp.down_proj.bias', 'model.layers.14.mlp.gate_proj.bias', 'model.layers.14.mlp.up_proj.bias', 'model.layers.14.self_attn.o_proj.bias', 'model.layers.15.mlp.down_proj.bias', 'model.layers.15.mlp.gate_proj.bias', 'model.layers.15.mlp.up_proj.bias', 'model.layers.15.self_attn.o_proj.bias', 'model.layers.16.mlp.down_proj.bias', 'model.layers.16.mlp.gate_proj.bias', 'model.layers.16.mlp.up_proj.bias', 'model.layers.16.self_attn.o_proj.bias', 'model.layers.17.mlp.down_proj.bias', 'model.layers.17.mlp.gate_proj.bias', 'model.layers.17.mlp.up_proj.bias', 'model.layers.17.self_attn.o_proj.bias', 'model.layers.18.mlp.down_proj.bias', 'model.layers.18.mlp.gate_proj.bias', 'model.layers.18.mlp.up_proj.bias', 'model.layers.18.self_attn.o_proj.bias', 'model.layers.19.mlp.down_proj.bias', 'model.layers.19.mlp.gate_proj.bias', 'model.layers.19.mlp.up_proj.bias', 'model.layers.19.self_attn.o_proj.bias', 'model.layers.2.mlp.down_proj.bias', 'model.layers.2.mlp.gate_proj.bias', 'model.layers.2.mlp.up_proj.bias', 'model.layers.2.self_attn.o_proj.bias', 'model.layers.20.mlp.down_proj.bias', 'model.layers.20.mlp.gate_proj.bias', 'model.layers.20.mlp.up_proj.bias', 'model.layers.20.self_attn.o_proj.bias', 'model.layers.21.mlp.down_proj.bias', 'model.layers.21.mlp.gate_proj.bias', 'model.layers.21.mlp.up_proj.bias', 'model.layers.21.self_attn.o_proj.bias', 'model.layers.22.mlp.down_proj.bias', 'model.layers.22.mlp.gate_proj.bias', 'model.layers.22.mlp.up_proj.bias', 'model.layers.22.self_attn.o_proj.bias', 'model.layers.23.mlp.down_proj.bias', 'model.layers.23.mlp.gate_proj.bias', 'model.layers.23.mlp.up_proj.bias', 'model.layers.23.self_attn.o_proj.bias', 'model.layers.24.mlp.down_proj.bias', 'model.layers.24.mlp.gate_proj.bias', 'model.layers.24.mlp.up_proj.bias', 'model.layers.24.self_attn.o_proj.bias', 'model.layers.25.mlp.down_proj.bias', 'model.layers.25.mlp.gate_proj.bias', 'model.layers.25.mlp.up_proj.bias', 'model.layers.25.self_attn.o_proj.bias', 'model.layers.26.mlp.down_proj.bias', 'model.layers.26.mlp.gate_proj.bias', 'model.layers.26.mlp.up_proj.bias', 'model.layers.26.self_attn.o_proj.bias', 'model.layers.27.mlp.down_proj.bias', 'model.layers.27.mlp.gate_proj.bias', 'model.layers.27.mlp.up_proj.bias', 'model.layers.27.self_attn.o_proj.bias', 'model.layers.28.mlp.down_proj.bias', 'model.layers.28.mlp.gate_proj.bias', 'model.layers.28.mlp.up_proj.bias', 'model.layers.28.self_attn.o_proj.bias', 'model.layers.29.mlp.down_proj.bias', 'model.layers.29.mlp.gate_proj.bias', 'model.layers.29.mlp.up_proj.bias', 'model.layers.29.self_attn.o_proj.bias', 'model.layers.3.mlp.down_proj.bias', 'model.layers.3.mlp.gate_proj.bias', 'model.layers.3.mlp.up_proj.bias', 'model.layers.3.self_attn.o_proj.bias', 'model.layers.30.mlp.down_proj.bias', 'model.layers.30.mlp.gate_proj.bias', 'model.layers.30.mlp.up_proj.bias', 'model.layers.30.self_attn.o_proj.bias', 'model.layers.31.mlp.down_proj.bias', 'model.layers.31.mlp.gate_proj.bias', 'model.layers.31.mlp.up_proj.bias', 'model.layers.31.self_attn.o_proj.bias', 'model.layers.32.mlp.down_proj.bias', 'model.layers.32.mlp.gate_proj.bias', 'model.layers.32.mlp.up_proj.bias', 'model.layers.32.self_attn.o_proj.bias', 'model.layers.33.mlp.down_proj.bias', 'model.layers.33.mlp.gate_proj.bias', 'model.layers.33.mlp.up_proj.bias', 'model.layers.33.self_attn.o_proj.bias', 'model.layers.34.mlp.down_proj.bias', 'model.layers.34.mlp.gate_proj.bias', 'model.layers.34.mlp.up_proj.bias', 'model.layers.34.self_attn.o_proj.bias', 'model.layers.35.mlp.down_proj.bias', 'model.layers.35.mlp.gate_proj.bias', 'model.layers.35.mlp.up_proj.bias', 'model.layers.35.self_attn.o_proj.bias', 'model.layers.36.mlp.down_proj.bias', 'model.layers.36.mlp.gate_proj.bias', 'model.layers.36.mlp.up_proj.bias', 'model.layers.36.self_attn.o_proj.bias', 'model.layers.37.mlp.down_proj.bias', 'model.layers.37.mlp.gate_proj.bias', 'model.layers.37.mlp.up_proj.bias', 'model.layers.37.self_attn.o_proj.bias', 'model.layers.38.mlp.down_proj.bias', 'model.layers.38.mlp.gate_proj.bias', 'model.layers.38.mlp.up_proj.bias', 'model.layers.38.self_attn.o_proj.bias', 'model.layers.39.mlp.down_proj.bias', 'model.layers.39.mlp.gate_proj.bias', 'model.layers.39.mlp.up_proj.bias', 'model.layers.39.self_attn.o_proj.bias', 'model.layers.4.mlp.down_proj.bias', 'model.layers.4.mlp.gate_proj.bias', 'model.layers.4.mlp.up_proj.bias', 'model.layers.4.self_attn.o_proj.bias', 'model.layers.40.mlp.down_proj.bias', 'model.layers.40.mlp.gate_proj.bias', 'model.layers.40.mlp.up_proj.bias', 'model.layers.40.self_attn.o_proj.bias', 'model.layers.41.mlp.down_proj.bias', 'model.layers.41.mlp.gate_proj.bias', 'model.layers.41.mlp.up_proj.bias', 'model.layers.41.self_attn.o_proj.bias', 'model.layers.42.mlp.down_proj.bias', 'model.layers.42.mlp.gate_proj.bias', 'model.layers.42.mlp.up_proj.bias', 'model.layers.42.self_attn.o_proj.bias', 'model.layers.43.mlp.down_proj.bias', 'model.layers.43.mlp.gate_proj.bias', 'model.layers.43.mlp.up_proj.bias', 'model.layers.43.self_attn.o_proj.bias', 'model.layers.44.mlp.down_proj.bias', 'model.layers.44.mlp.gate_proj.bias', 'model.layers.44.mlp.up_proj.bias', 'model.layers.44.self_attn.o_proj.bias', 'model.layers.45.mlp.down_proj.bias', 'model.layers.45.mlp.gate_proj.bias', 'model.layers.45.mlp.up_proj.bias', 'model.layers.45.self_attn.o_proj.bias', 'model.layers.46.mlp.down_proj.bias', 'model.layers.46.mlp.gate_proj.bias', 'model.layers.46.mlp.up_proj.bias', 'model.layers.46.self_attn.o_proj.bias', 'model.layers.47.mlp.down_proj.bias', 'model.layers.47.mlp.gate_proj.bias', 'model.layers.47.mlp.up_proj.bias', 'model.layers.47.self_attn.o_proj.bias', 'model.layers.5.mlp.down_proj.bias', 'model.layers.5.mlp.gate_proj.bias', 'model.layers.5.mlp.up_proj.bias', 'model.layers.5.self_attn.o_proj.bias', 'model.layers.6.mlp.down_proj.bias', 'model.layers.6.mlp.gate_proj.bias', 'model.layers.6.mlp.up_proj.bias', 'model.layers.6.self_attn.o_proj.bias', 'model.layers.7.mlp.down_proj.bias', 'model.layers.7.mlp.gate_proj.bias', 'model.layers.7.mlp.up_proj.bias', 'model.layers.7.self_attn.o_proj.bias', 'model.layers.8.mlp.down_proj.bias', 'model.layers.8.mlp.gate_proj.bias', 'model.layers.8.mlp.up_proj.bias', 'model.layers.8.self_attn.o_proj.bias', 'model.layers.9.mlp.down_proj.bias', 'model.layers.9.mlp.gate_proj.bias', 'model.layers.9.mlp.up_proj.bias', 'model.layers.9.self_attn.o_proj.bias'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

Steps to reproduce

This happens to qwen/Qwen2.5-14B-Instruct-GPTQ-Int4 while qwen/Qwen1.5-14B-Chat-GPTQ-Int4 is OK.

QwenLM / Qwen2.5