QwenLM / Qwen2.5

Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud.
9.61k stars 590 forks source link

[Badcase]: GPTQ on modelscope some weight not initialized #930

Closed huashiyiqike closed 1 month ago

huashiyiqike commented 1 month ago

Model Series

Qwen2.5

What are the models used?

qwen/Qwen2.5-14B-Instruct-GPTQ-Int4

What is the scenario where the problem happened?

Some weights of Qwen2ForCausalLM were not initialized from the model checkpoint at modelposition

Is this badcase known and can it be solved using avaiable techniques?

Information about environment

There is no such problem if I change model to qwen/Qwen1.5-14B-Chat-GPTQ-Int4

OS: ubuntu20.04 Python:3.10.9 NVIDIA driver: 560.35.02 cuda_11.7 pytorch: 2.0.1+cu117 requirements.txt: absl-py==1.4.0 accelerate==0.30.1 addict==2.4.0 aiofiles==23.1.0 aiohttp==3.8.4 aiosignal==1.3.1 aliyun-python-sdk-core==2.13.36 aliyun-python-sdk-kms==2.16.1 altair==4.2.2 anyio==3.6.2 appdirs==1.4.4 argilla==1.6.0 argon2-cffi==21.3.0 argon2-cffi-bindings==21.2.0 arrow==1.2.3 asttokens==2.2.1 async-timeout==4.0.2 attrs==23.1.0 backcall==0.2.0 backoff==2.2.1 beautifulsoup4==4.12.2 bitsandbytes==0.43.1 bleach==6.0.0 bottle==0.12.25 cachetools==5.3.0 certifi==2022.12.7 cffi==1.15.1 chardet==5.1.0 charset-normalizer==3.1.0 click==8.1.3 cmake==3.26.1 coloredlogs==15.0.1 comm==0.1.3 commonmark==0.9.1 contourpy==1.0.7 cpm-kernels==1.0.11 crcmod==1.7 cryptography==40.0.2 cycler==0.11.0 dataclasses-json==0.5.7 datasets==2.18.0 DBUtils==3.0.3 debugpy==1.6.7 decorator==5.1.1 deepspeed defusedxml==0.7.1 Deprecated==1.2.13 dill==0.3.6 docker-pycreds==0.4.0 docx2txt==0.8 einops==0.6.1 entrypoints==0.4 environs==9.5.0 et-xmlfile==1.1.0 executing==1.2.0 faiss-cpu==1.7.4 fastapi==0.95.1 fastjsonschema==2.16.3 ffmpy==0.3.0 filelock==3.11.0 flash-attn @ file:///mnt/c/Users/huash/Downloads/flash-attention-main%20%281%29/flash-attention-main/flash_attn-2.5.7%2Bcu118torch2.0cxx11abiFALSE-cp310-cp310-linux_x86_64.whl#sha256=d2fde085802c694d67ab4c7573e38da54539dfdaab6b809b3e107f18c39ac3f0 flexgen==0.1.7 fonttools==4.39.3 fqdn==1.5.1 frozenlist==1.3.3 fsspec==2024.2.0 gast==0.5.4 gekko==1.1.1 gitdb==4.0.10 GitPython==3.1.31 google-auth==2.17.3 google-auth-oauthlib==1.0.0 gradio==3.23.0 gradio_client==0.1.3 graypy==2.1.0 greenlet==2.0.2 grpcio grpcio-tools==1.53.0 h11==0.14.0 hjson==3.1.0 httpcore==0.16.3 httpx==0.23.3 huggingface-hub==0.23.0 humanfriendly==10.0 icetk==0.0.4 idna==3.4 importlib-metadata==6.8.0 InstructorEmbedding==1.0.0 ipykernel==6.22.0 ipython==8.12.0 ipython-genutils==0.2.0 ipywidgets==8.0.6 isoduration==20.11.0 jedi==0.18.2 jieba==0.42.1 Jinja2==3.1.2 jmespath==0.10.0 joblib==1.2.0 jsonpointer==2.3 jsonschema==4.17.3 jupyter-events==0.6.3 jupyter_client==8.2.0 jupyter_core==5.3.0 jupyter_server==2.5.0 jupyter_server_terminals==0.4.4 jupyterlab-pygments==0.2.2 jupyterlab-widgets==3.0.7 kiwisolver==1.4.4 langchain==0.0.150 linkify-it-py==2.0.0 lit==16.0.0 llama-cpp-python==0.1.48 lxml==4.9.2 Markdown==3.4.3 markdown-it-py==2.2.0 markdown-to-json==2.1.0 markdown2==2.4.8 MarkupSafe==2.1.2 marshmallow==3.19.0 marshmallow-enum==1.5.1 matplotlib==3.7.1 matplotlib-inline==0.1.6 mdit-py-plugins==0.3.3 mdurl==0.1.2 mistune==2.0.5 modelscope==1.14.0 monotonic==1.6 mpmath==1.3.0 msg-parser==1.2.0 multidict==6.0.4 multiprocess==0.70.14 mypy-extensions==1.0.0 nbclassic==0.5.5 nbclient==0.7.3 nbconvert==7.3.1 nbformat==5.8.0 nest-asyncio==1.5.6 networkx==3.0 nh3==0.2.11 ninja==1.11.1.1 nltk==3.8.1 notebook==6.5.4 notebook_shim==0.2.2 numexpr==2.8.4 numpy==1.22.0 nvidia-cublas-cu11==11.10.3.66 nvidia-cuda-cupti-cu11==11.7.101 nvidia-cuda-nvrtc-cu11==11.7.99 nvidia-cuda-runtime-cu11==11.7.99 nvidia-cudnn-cu11==8.5.0.96 nvidia-cufft-cu11==10.9.0.58 nvidia-curand-cu11==10.2.10.91 nvidia-cusolver-cu11==11.4.0.1 nvidia-cusparse-cu11==11.7.4.91 nvidia-nccl-cu11==2.14.3 nvidia-nvtx-cu11==11.7.91 oauthlib==3.2.2 olefile==0.46 openapi-schema-pydantic==1.2.4 openpyxl==3.1.2 optimum==1.19.2 orjson==3.8.10 oss2==2.18.1 packaging==23.1 pandas==1.3.5 pandocfilters==1.5.0 parso==0.8.3 Paste==3.5.2 pathtools==0.1.2 pdfkit==1.0.0 pdfminer.six==20221105 pdfplumber==0.9.0 peft==0.10.0 pexpect==4.8.0 pickleshare==0.7.5 Pillow==9.5.0 platformdirs==3.10.0 prometheus-client==0.16.0 prompt-toolkit==3.0.38 protobuf psutil==5.9.4 ptyprocess==0.7.0 PuLP==2.7.0 pure-eval==0.2.2 py-cpuinfo==9.0.0 pyarrow==16.0.0 pyarrow-hotfix==0.6 pyasn1==0.5.0 pyasn1-modules==0.3.0 pycparser==2.21 pycryptodome==3.17 pydantic==1.10.7 pydub==0.25.1 Pygments==2.15.1 pymilvus==2.2.8 PyMuPDF==1.22.3 PyMySQL==1.1.0 pyodbc==4.0.39 pypandoc==1.11 pyparsing==3.0.9 pyre-extensions==0.0.29 pyrsistent==0.19.3 python-dateutil==2.8.2 python-docx==0.8.11 python-dotenv==1.0.0 python-json-logger==2.0.7 python-magic==0.4.27 python-multipart==0.0.6 python-pptx==0.6.21 pytz==2023.3 PyYAML==6.0 pyzmq==25.0.2 quant-cuda==0.0.0 redis==4.6.0 regex==2023.3.23 requests==2.28.2 requests-oauthlib==1.3.1 responses==0.18.0 rfc3339-validator==0.1.4 rfc3986==1.5.0 rfc3986-validator==0.1.1 rich==13.0.1 rouge==1.0.1 rsa==4.9 rwkv==0.7.3 safetensors==0.4.3 scikit-learn==1.2.2 scipy==1.10.1 semantic-version==2.10.0 Send2Trash==1.8.0 sentence-transformers==2.2.2 sentencepiece==0.1.97 sentry-sdk==1.22.2 setproctitle==1.3.2 shortuuid==1.0.11 simplejson==3.19.1 six==1.16.0 smmap==5.0.0 sniffio==1.3.0 sortedcontainers==2.4.0 soupsieve==2.4.1 SQLAlchemy==2.0.10 SQLAlchemy-Utils==0.41.0 stack-data==0.6.2 starlette==0.26.1 style==1.1.0 svgwrite==1.4.3 sympy==1.11.1 tenacity==8.2.2 tensorboard==2.12.2 tensorboard-data-server==0.7.0 tensorboard-plugin-wit==1.8.1 terminado==0.17.1 threadpoolctl==3.1.0 tiktoken==0.3.3 tinycss2==1.2.1 tokenizers==0.19.1 tomli==2.0.1 toolz==0.12.0 torch==2.0.1 torchaudio torchvision tornado==6.3.1 tqdm==4.65.0 traitlets==5.9.0 transformers==4.40.2 transformers-stream-generator==0.0.4 triton==2.0.0 typing-inspect==0.8.0 typing_extensions==4.5.0 tzdata==2023.3 uc-micro-py==1.0.1 ujson==5.7.0 unstructured==0.6.2 update==0.0.1 uplink==0.9.7 uri-template==1.2.0 uritemplate==4.1.1 urllib3==1.26.15 uvicorn==0.21.1 Wand==0.6.11 wandb==0.15.2 wavedrom==2.0.3.post3 wcwidth==0.2.6 webcolors==1.13 webencodings==0.5.1 websocket-client==1.5.1 websockets==11.0.1 Werkzeug==2.2.3 widgetsnbextension==4.0.7 wrapt==1.14.1 xformers==0.0.20 XlsxWriter==3.1.0 xxhash==3.2.0 yapf==0.40.1 yarl==1.8.2 zipp==3.16.2

Description

the output is as follows:

Loading checkpoint shards: 100%|███████████████████████████████████████████████████| 3/3 [00:07<00:00, 2.57s/it] Some weights of Qwen2ForCausalLM were not initialized from the model checkpoint at /home/wsl/.cache/modelscope/hub/qwen/Qwen2___5-14B-Instruct-GPTQ-Int4 and are newly initialized: ['model.layers.0.mlp.down_proj.bias', 'model.layers.0.mlp.gate_proj.bias', 'model.layers.0.mlp.up_proj.bias', 'model.layers.0.self_attn.o_proj.bias', 'model.layers.1.mlp.down_proj.bias', 'model.layers.1.mlp.gate_proj.bias', 'model.layers.1.mlp.up_proj.bias', 'model.layers.1.self_attn.o_proj.bias', 'model.layers.10.mlp.down_proj.bias', 'model.layers.10.mlp.gate_proj.bias', 'model.layers.10.mlp.up_proj.bias', 'model.layers.10.self_attn.o_proj.bias', 'model.layers.11.mlp.down_proj.bias', 'model.layers.11.mlp.gate_proj.bias', 'model.layers.11.mlp.up_proj.bias', 'model.layers.11.self_attn.o_proj.bias', 'model.layers.12.mlp.down_proj.bias', 'model.layers.12.mlp.gate_proj.bias', 'model.layers.12.mlp.up_proj.bias', 'model.layers.12.self_attn.o_proj.bias', 'model.layers.13.mlp.down_proj.bias', 'model.layers.13.mlp.gate_proj.bias', 'model.layers.13.mlp.up_proj.bias', 'model.layers.13.self_attn.o_proj.bias', 'model.layers.14.mlp.down_proj.bias', 'model.layers.14.mlp.gate_proj.bias', 'model.layers.14.mlp.up_proj.bias', 'model.layers.14.self_attn.o_proj.bias', 'model.layers.15.mlp.down_proj.bias', 'model.layers.15.mlp.gate_proj.bias', 'model.layers.15.mlp.up_proj.bias', 'model.layers.15.self_attn.o_proj.bias', 'model.layers.16.mlp.down_proj.bias', 'model.layers.16.mlp.gate_proj.bias', 'model.layers.16.mlp.up_proj.bias', 'model.layers.16.self_attn.o_proj.bias', 'model.layers.17.mlp.down_proj.bias', 'model.layers.17.mlp.gate_proj.bias', 'model.layers.17.mlp.up_proj.bias', 'model.layers.17.self_attn.o_proj.bias', 'model.layers.18.mlp.down_proj.bias', 'model.layers.18.mlp.gate_proj.bias', 'model.layers.18.mlp.up_proj.bias', 'model.layers.18.self_attn.o_proj.bias', 'model.layers.19.mlp.down_proj.bias', 'model.layers.19.mlp.gate_proj.bias', 'model.layers.19.mlp.up_proj.bias', 'model.layers.19.self_attn.o_proj.bias', 'model.layers.2.mlp.down_proj.bias', 'model.layers.2.mlp.gate_proj.bias', 'model.layers.2.mlp.up_proj.bias', 'model.layers.2.self_attn.o_proj.bias', 'model.layers.20.mlp.down_proj.bias', 'model.layers.20.mlp.gate_proj.bias', 'model.layers.20.mlp.up_proj.bias', 'model.layers.20.self_attn.o_proj.bias', 'model.layers.21.mlp.down_proj.bias', 'model.layers.21.mlp.gate_proj.bias', 'model.layers.21.mlp.up_proj.bias', 'model.layers.21.self_attn.o_proj.bias', 'model.layers.22.mlp.down_proj.bias', 'model.layers.22.mlp.gate_proj.bias', 'model.layers.22.mlp.up_proj.bias', 'model.layers.22.self_attn.o_proj.bias', 'model.layers.23.mlp.down_proj.bias', 'model.layers.23.mlp.gate_proj.bias', 'model.layers.23.mlp.up_proj.bias', 'model.layers.23.self_attn.o_proj.bias', 'model.layers.24.mlp.down_proj.bias', 'model.layers.24.mlp.gate_proj.bias', 'model.layers.24.mlp.up_proj.bias', 'model.layers.24.self_attn.o_proj.bias', 'model.layers.25.mlp.down_proj.bias', 'model.layers.25.mlp.gate_proj.bias', 'model.layers.25.mlp.up_proj.bias', 'model.layers.25.self_attn.o_proj.bias', 'model.layers.26.mlp.down_proj.bias', 'model.layers.26.mlp.gate_proj.bias', 'model.layers.26.mlp.up_proj.bias', 'model.layers.26.self_attn.o_proj.bias', 'model.layers.27.mlp.down_proj.bias', 'model.layers.27.mlp.gate_proj.bias', 'model.layers.27.mlp.up_proj.bias', 'model.layers.27.self_attn.o_proj.bias', 'model.layers.28.mlp.down_proj.bias', 'model.layers.28.mlp.gate_proj.bias', 'model.layers.28.mlp.up_proj.bias', 'model.layers.28.self_attn.o_proj.bias', 'model.layers.29.mlp.down_proj.bias', 'model.layers.29.mlp.gate_proj.bias', 'model.layers.29.mlp.up_proj.bias', 'model.layers.29.self_attn.o_proj.bias', 'model.layers.3.mlp.down_proj.bias', 'model.layers.3.mlp.gate_proj.bias', 'model.layers.3.mlp.up_proj.bias', 'model.layers.3.self_attn.o_proj.bias', 'model.layers.30.mlp.down_proj.bias', 'model.layers.30.mlp.gate_proj.bias', 'model.layers.30.mlp.up_proj.bias', 'model.layers.30.self_attn.o_proj.bias', 'model.layers.31.mlp.down_proj.bias', 'model.layers.31.mlp.gate_proj.bias', 'model.layers.31.mlp.up_proj.bias', 'model.layers.31.self_attn.o_proj.bias', 'model.layers.32.mlp.down_proj.bias', 'model.layers.32.mlp.gate_proj.bias', 'model.layers.32.mlp.up_proj.bias', 'model.layers.32.self_attn.o_proj.bias', 'model.layers.33.mlp.down_proj.bias', 'model.layers.33.mlp.gate_proj.bias', 'model.layers.33.mlp.up_proj.bias', 'model.layers.33.self_attn.o_proj.bias', 'model.layers.34.mlp.down_proj.bias', 'model.layers.34.mlp.gate_proj.bias', 'model.layers.34.mlp.up_proj.bias', 'model.layers.34.self_attn.o_proj.bias', 'model.layers.35.mlp.down_proj.bias', 'model.layers.35.mlp.gate_proj.bias', 'model.layers.35.mlp.up_proj.bias', 'model.layers.35.self_attn.o_proj.bias', 'model.layers.36.mlp.down_proj.bias', 'model.layers.36.mlp.gate_proj.bias', 'model.layers.36.mlp.up_proj.bias', 'model.layers.36.self_attn.o_proj.bias', 'model.layers.37.mlp.down_proj.bias', 'model.layers.37.mlp.gate_proj.bias', 'model.layers.37.mlp.up_proj.bias', 'model.layers.37.self_attn.o_proj.bias', 'model.layers.38.mlp.down_proj.bias', 'model.layers.38.mlp.gate_proj.bias', 'model.layers.38.mlp.up_proj.bias', 'model.layers.38.self_attn.o_proj.bias', 'model.layers.39.mlp.down_proj.bias', 'model.layers.39.mlp.gate_proj.bias', 'model.layers.39.mlp.up_proj.bias', 'model.layers.39.self_attn.o_proj.bias', 'model.layers.4.mlp.down_proj.bias', 'model.layers.4.mlp.gate_proj.bias', 'model.layers.4.mlp.up_proj.bias', 'model.layers.4.self_attn.o_proj.bias', 'model.layers.40.mlp.down_proj.bias', 'model.layers.40.mlp.gate_proj.bias', 'model.layers.40.mlp.up_proj.bias', 'model.layers.40.self_attn.o_proj.bias', 'model.layers.41.mlp.down_proj.bias', 'model.layers.41.mlp.gate_proj.bias', 'model.layers.41.mlp.up_proj.bias', 'model.layers.41.self_attn.o_proj.bias', 'model.layers.42.mlp.down_proj.bias', 'model.layers.42.mlp.gate_proj.bias', 'model.layers.42.mlp.up_proj.bias', 'model.layers.42.self_attn.o_proj.bias', 'model.layers.43.mlp.down_proj.bias', 'model.layers.43.mlp.gate_proj.bias', 'model.layers.43.mlp.up_proj.bias', 'model.layers.43.self_attn.o_proj.bias', 'model.layers.44.mlp.down_proj.bias', 'model.layers.44.mlp.gate_proj.bias', 'model.layers.44.mlp.up_proj.bias', 'model.layers.44.self_attn.o_proj.bias', 'model.layers.45.mlp.down_proj.bias', 'model.layers.45.mlp.gate_proj.bias', 'model.layers.45.mlp.up_proj.bias', 'model.layers.45.self_attn.o_proj.bias', 'model.layers.46.mlp.down_proj.bias', 'model.layers.46.mlp.gate_proj.bias', 'model.layers.46.mlp.up_proj.bias', 'model.layers.46.self_attn.o_proj.bias', 'model.layers.47.mlp.down_proj.bias', 'model.layers.47.mlp.gate_proj.bias', 'model.layers.47.mlp.up_proj.bias', 'model.layers.47.self_attn.o_proj.bias', 'model.layers.5.mlp.down_proj.bias', 'model.layers.5.mlp.gate_proj.bias', 'model.layers.5.mlp.up_proj.bias', 'model.layers.5.self_attn.o_proj.bias', 'model.layers.6.mlp.down_proj.bias', 'model.layers.6.mlp.gate_proj.bias', 'model.layers.6.mlp.up_proj.bias', 'model.layers.6.self_attn.o_proj.bias', 'model.layers.7.mlp.down_proj.bias', 'model.layers.7.mlp.gate_proj.bias', 'model.layers.7.mlp.up_proj.bias', 'model.layers.7.self_attn.o_proj.bias', 'model.layers.8.mlp.down_proj.bias', 'model.layers.8.mlp.gate_proj.bias', 'model.layers.8.mlp.up_proj.bias', 'model.layers.8.self_attn.o_proj.bias', 'model.layers.9.mlp.down_proj.bias', 'model.layers.9.mlp.gate_proj.bias', 'model.layers.9.mlp.up_proj.bias', 'model.layers.9.self_attn.o_proj.bias'] You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference. Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

Steps to reproduce

This happens to qwen/Qwen2.5-14B-Instruct-GPTQ-Int4 while qwen/Qwen1.5-14B-Chat-GPTQ-Int4 is OK.

jklj077 commented 1 month ago

Hi, you can safely ignore the warnings.

Edited: this warning can not be safely ignored as it is different from the previous ones from Qwen2. Please ensure optimum>1.20.0. Sorry for the inconvenience.

huashiyiqike commented 1 month ago

Then it doesn't work. error: RuntimeError: probability tensor contains either inf, nan or element < 0 @jklj077

volcano1995 commented 1 month ago

Then it doesn't work. error: RuntimeError: probability tensor contains either inf, nan or element < 0 @jklj077

I meet this problem as well. Do you solve it?