QwenLM / Qwen2.5

Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud.
8.56k stars 530 forks source link

[Badcase]: Qwen2.5-72B-Instruct-GPTQ-Int4 input_size_per_partition #986

Open hyliush opened 3 days ago

hyliush commented 3 days ago

Model Series

Qwen2.5

What are the models used?

Qwen2.5-72B-Instruct-GPTQ-Int4

What is the scenario where the problem happened?

Qwen2.5-72B-Instruct-GPTQ-Int4 params error

Is this badcase known and can it be solved using avaiable techniques?

Information about environment

absl-py 2.1.0 accelerate 0.31.0 adaseq 0.6.6 addict 2.4.0 aiohttp 3.9.5 aiosignal 1.3.1 albucore 0.0.12 albumentations 1.4.10 alias-free-torch 0.0.6 aliyun-python-sdk-core 2.15.1 aliyun-python-sdk-kms 2.16.3 aniso8601 9.0.1 annotated-types 0.7.0 antlr4-python3-runtime 4.9.3 anyio 4.4.0 apex 0.1 appdirs 1.4.4 argon2-cffi 23.1.0 argon2-cffi-bindings 21.2.0 arrow 1.3.0 asttokens 2.4.1 astunparse 1.6.3 async-lru 2.0.4 async-timeout 4.0.3 attrs 23.2.0 audioread 3.0.1 auto_gptq 0.7.1 autoawq 0.2.5 autoawq_kernels 0.0.6 av 12.2.0 Babel 2.15.0 basicsr 1.4.2 beartype 0.18.5 beautifulsoup4 4.12.3 bidict 0.23.1 binpacking 1.5.2 biopython 1.83 bitarray 2.9.2 bitsandbytes 0.43.1 bitstring 4.2.3 black 24.4.2 bleach 6.1.0 blis 0.7.11 blobfile 2.1.1 bmt-clipit 1.0 boto3 1.34.136 botocore 1.34.136 cachetools 5.3.3 catalogue 2.0.10 certifi 2024.2.2 cffi 1.16.0 cfgv 3.4.0 charset-normalizer 3.3.2 chumpy 0.70 cityscapesScripts 2.2.3 click 8.1.7 clip 1.0 cloudpathlib 0.18.1 cloudpickle 3.0.0 cmake 3.29.6 colorama 0.4.6 coloredlogs 14.0 comm 0.2.2 confection 0.1.5 ConfigArgParse 1.7 contextlib2 21.6.0 contourpy 1.2.1 control-ldm 0.0.1 crcmod 1.7 cryptography 42.0.8 cycler 0.12.1 cymem 2.0.8 Cython 0.29.36 dacite 1.8.1 dataclasses 0.6 datasets 2.18.0 ddpm-guided-diffusion 0.0.0 debugpy 1.8.2 decorator 4.4.2 decord 0.6.0 deepspeed 0.14.4 defusedxml 0.7.1 descartes 1.1.0 detectron2 0.6 dgl 2.1.0+cu121 diffusers 0.29.2 dill 0.3.8 diskcache 5.6.3 Distance 0.1.3 distlib 0.3.8 distro 1.9.0 dnspython 2.3.0 docstring_parser 0.16 easydict 1.13 easyrobust 0.2.4 edit-distance 1.0.6 editdistance 0.5.2 einops 0.8.0 email_validator 2.2.0 embeddings 0.0.8 emoji 2.12.1 espnet-tts-frontend 0.0.3 et-xmlfile 1.1.0 eventlet 0.36.1 exceptiongroup 1.2.1 executing 2.0.1 expecttest 0.2.1 face-alignment 1.4.1 fairscale 0.4.13 fairseq 0.12.2 fastai 2.7.15 fastapi 0.111.0 fastapi-cli 0.0.4 fastcore 1.5.48 fastdownload 0.0.7 fastjsonschema 2.20.0 fastprogress 1.0.3 fasttext 0.9.3 ffmpeg 1.4 ffmpeg-python 0.2.0 filelock 3.14.0 fire 0.6.0 flake8 7.1.0 flash_attn 2.5.9.post1 Flask 2.2.5 Flask-Cors 4.0.1 Flask-RESTful 0.3.10 Flask-SocketIO 5.3.6 flask-talisman 1.1.0 flatbuffers 24.3.25 fonttools 4.53.0 fqdn 1.5.1 frozenlist 1.4.1 fsspec 2024.2.0 ftfy 6.2.0 funasr 1.0.30 funcodec 0.2.0 funtextprocessing 0.1.1 future 1.0.0 fvcore 0.1.5.post20221221 g2p 2.0.0 g2p-en 2.1.0 gast 0.5.4 gekko 1.1.3 google-pasta 0.2.0 greenlet 3.0.3 grpcio 1.64.0 h11 0.14.0 h5py 3.11.0 hdbscan 0.8.37 hjson 3.1.0 httpcore 1.0.5 httptools 0.6.1 httpx 0.27.0 huggingface-hub 0.23.4 humanfriendly 10.0 hydra-core 1.3.2 HyperPyYAML 1.2.2 identify 2.5.36 idna 3.7 imageio 2.34.2 imageio-ffmpeg 0.4.9 imgaug 0.4.0 importlib_metadata 7.1.0 inflect 7.0.0 iniconfig 2.0.0 interegular 0.3.3 iopath 0.1.9 ipdb 0.13.13 ipykernel 6.29.4 ipython 8.24.0 isoduration 20.11.0 isort 5.13.2 itsdangerous 2.2.0 jaconv 0.3.4 jamo 0.4.1 jedi 0.19.1 jieba 0.42.1 Jinja2 3.1.4 jmespath 0.10.0 joblib 1.4.2 json-tricks 3.17.3 json5 0.9.25 jsonplus 0.8.0 jsonpointer 3.0.0 jsonschema 4.22.0 jsonschema-specifications 2023.12.1 jupyter_client 8.6.2 jupyter_core 5.7.2 jupyter-events 0.10.0 jupyter-lsp 2.2.5 jupyter_server 2.14.1 jupyter_server_terminals 0.5.3 jupyterlab 4.2.3 jupyterlab_pygments 0.3.0 jupyterlab_server 2.27.2 kaldiio 2.18.0 kantts 1.0.1 keras 3.3.3 kiwisolver 1.4.5 kornia 0.7.3 kornia_rs 0.1.4 kwsbp 0.0.6 langcodes 3.4.0 language_data 1.2.0 lap 0.4.0 lark 1.1.9 lazy_loader 0.4 libclang 18.1.1 librosa 0.10.1 lightning-utilities 0.11.3.post0 llvmlite 0.43.0 lm-format-enforcer 0.10.1 lmdb 1.5.1 local-attention 1.9.3 lpips 0.1.4 lxml 4.9.4 lyft-dataset-sdk 0.0.8 marisa-trie 1.2.0 Markdown 3.6 markdown-it-py 3.0.0 MarkupSafe 2.1.5 matplotlib 3.5.3 matplotlib-inline 0.1.7 mccabe 0.7.0 mdurl 0.1.2 megatron-util 1.3.2 MinDAEC 0.0.2 mir-eval 0.7 mistune 3.0.2 ml-collections 0.1.1 ml-dtypes 0.3.2 mmcls 0.25.0 mmcv-full 1.7.0 mmdet 2.28.2 mmdet3d 1.0.0a1 mmsegmentation 0.30.0 mock 5.1.0 modelscope 1.16.0 moviepy 1.0.3 mpi4py 3.1.6 mpmath 1.3.0 ms-swift 2.1.1.post2 msgpack 1.0.8 multidict 6.0.5 multiprocess 0.70.16 munkres 1.1.4 murmurhash 1.0.10 mypy-extensions 1.0.0 namex 0.0.8 nbclient 0.10.0 nbconvert 7.16.4 nbformat 5.10.4 nerfacc 0.2.2 nest-asyncio 1.6.0 networkx 3.3 ninja 1.11.1.1 nltk 3.8.1 nodeenv 1.9.1 notebook_shim 0.2.4 numba 0.60.0 numpy 1.26.3 nuscenes-devkit 1.1.11 nvdiffrast 0.3.1 nvidia-ml-py 12.555.43 omegaconf 2.3.0 onnx 1.16.1 onnxruntime 1.18.1 onnxsim 0.4.36 open-clip-torch 2.24.0 openai 1.35.7 opencv-python 4.10.0.84 opencv-python-headless 4.10.0.84 openpyxl 3.1.5 opt-einsum 3.3.0 optimum 1.20.0 optree 0.11.0 orjson 3.10.5 oss2 2.18.6 outlines 0.0.46 overrides 7.7.0 packaging 24.0 pai-easycv 0.11.6 paint-ldm 0.0.0 pandas 2.2.2 pandocfilters 1.5.1 panopticapi 0.1 panphon 0.20.0 parso 0.8.4 pathspec 0.12.1 peft 0.11.1 pexpect 4.9.0 phaseaug 1.0.1 pickleshare 0.7.5 pillow 10.2.0 pip 23.0.1 platformdirs 4.2.2 plotly 5.22.0 pluggy 1.5.0 plyfile 1.0.3 pointnet2 0.0.0 pooch 1.8.2 portalocker 2.8.2 pre-commit 3.7.1 preshed 3.0.9 prettytable 3.10.0 proglog 0.1.10 prometheus_client 0.20.0 prometheus-fastapi-instrumentator 7.0.0 prompt_toolkit 3.0.45 protobuf 3.20.3 psutil 5.9.8 ptflops 0.7.3 ptyprocess 0.7.0 pure-eval 0.2.2 py-cpuinfo 9.0.0 py-sound-connect 0.2.1 pyairports 2.1.1 pyarrow 16.1.0 pyarrow-hotfix 0.6 pybind11 2.13.1 pyclipper 1.3.0.post5 pycocoevalcap 1.2 pycocotools 2.0.8 pycodestyle 2.12.0 pycountry 24.6.1 pycparser 2.22 pycryptodome 3.20.0 pycryptodomex 3.20.0 pydantic 2.7.4 pydantic_core 2.18.4 pyDeprecate 0.3.2 pydot 2.0.0 pyflakes 3.2.0 Pygments 2.18.0 PyMCubes 0.1.4 pynini 2.1.5 pynndescent 0.5.13 pyparsing 3.1.2 pypinyin 0.44.0 pyquaternion 0.9.9 pysptk 0.1.18 pytest 8.2.2 pythainlp 5.0.4 python-crfsuite 0.9.10 python-dateutil 2.9.0.post0 python-dotenv 1.0.1 python-engineio 4.9.1 python-json-logger 2.0.7 python-multipart 0.0.9 python-socketio 5.11.3 pytorch-lightning 1.7.7 pytorch-metric-learning 2.5.0 pytorch-wavelets 1.3.0 pytorch-wpe 0.0.1 pytorch3d 0.7.6 pytz 2024.1 pyvi 0.1.1 PyWavelets 1.6.0 PyYAML 6.0.1 pyzmq 26.0.3 rapidfuzz 3.9.3 ray 2.31.0 referencing 0.35.1 regex 2024.5.15 requests 2.32.3 resampy 0.4.3 rfc3339-validator 0.1.4 rfc3986-validator 0.1.1 rich 13.7.1 rotary-embedding-torch 0.6.3 rouge 1.0.1 rouge-score 0.0.4 rpds-py 0.18.1 ruamel.yaml 0.18.6 ruamel.yaml.clib 0.2.8 s3transfer 0.10.2 sacrebleu 2.4.2 sacremoses 0.1.1 safetensors 0.4.3 scikit-image 0.24.0 scikit-learn 1.5.0 scipy 1.12.0 seaborn 0.13.2 Send2Trash 1.8.3 sentencepiece 0.2.0 seqeval 1.2.2 setuptools 70.1.1 Shapely 1.8.4 shellingham 1.5.4 shotdetect-scenedetect-lgss 0.0.4 shtab 1.7.1 simple-websocket 1.0.0 simplejson 3.19.2 six 1.16.0 sklearn-crfsuite 0.5.0 smart-open 7.0.4 smplx 0.1.28 sniffio 1.3.1 sortedcontainers 2.4.0 soundfile 0.12.1 soupsieve 2.5 sox 1.5.0 soxr 0.3.7 spacy 3.7.5 spacy-legacy 3.0.12 spacy-loggers 1.0.5 speechbrain 1.0.0 srsly 2.4.8 sse-starlette 2.1.2 stack-data 0.6.3 stanza 1.8.2 starlette 0.37.2 subword-nmt 0.3.8 sympy 1.12.1 tabulate 0.9.0 taming-transformers-rom1504 0.0.6 tenacity 8.4.2 tensorboard 2.17.0 tensorboard-data-server 0.7.2 tensorboardX 2.6.2.2 tensordict 0.4.0 tensorflow 2.16.1 tensorflow-estimator 2.15.0 tensorflow-io-gcs-filesystem 0.37.0 termcolor 2.4.0 terminado 0.18.1 terminaltables 3.1.10 text-unidecode 1.3 text2sql-lgesql 1.3.0 tf-slim 1.1.0 thinc 8.2.5 thop 0.1.1.post2209072238 threadpoolctl 3.5.0 tifffile 2024.6.18 tiktoken 0.7.0 timm 1.0.7 tinycss2 1.3.0 tinycudann 1.7+torch230cu121 tokenizers 0.19.1 toml 0.10.2 tomli 2.0.1 torch 2.3.0+cu121 torch-complex 0.4.4 torch-scatter 2.1.2 torchaudio 2.3.0+cu121 torchdata 0.7.1 torchmetrics 0.11.4 torchsde 0.2.6 torchsummary 1.5.1 torchvision 0.18.0+cu121 tornado 6.4.1 tqdm 4.66.4 traitlets 5.14.3 trampoline 0.1.2 transformers 4.41.2 transformers-stream-generator 0.0.5 trimesh 2.35.39 triton 2.3.1 trl 0.9.4 ttsfrd 0.2.1 typeguard 2.13.3 typer 0.12.3 types-python-dateutil 2.9.0.20240316 typing 3.7.4.3 typing_extensions 4.12.0 tyro 0.8.5 tzdata 2024.1 ujson 5.10.0 umap-learn 0.5.6 unicodecsv 0.14.1 unicodedata2 15.1.0 unicore 1.2.1 Unidecode 1.3.8 uri-template 1.3.0 urllib3 2.2.1 utils 1.0.2 uvicorn 0.30.1 uvloop 0.19.0 videofeatures-clipit 1.0 virtualenv 20.26.3 vllm 0.5.0.post1 vllm-flash-attn 2.5.9 wasabi 1.1.3 watchfiles 0.22.0 wcwidth 0.2.13 weasel 0.4.1 webcolors 24.6.0 webencodings 0.5.1 websocket-client 1.8.0 websockets 12.0 Werkzeug 3.0.3 wget 3.2 wheel 0.43.0 wrapt 1.16.0 wsproto 1.2.0 xformers 0.0.26.post1 xtcocotools 1.14 xxhash 3.4.1 yacs 0.1.8 yapf 0.30.0 yarl 1.9.4 zhconv 1.4.3 zipp 3.19.0 zstandard 0.22.0

Description

llm = LLM(model=model_path, trust_remote_code=True, enforce_eager=True, max_model_len=32768, tensor_parallel_size=2)

Qwen2.5-72B-Instruct-GPTQ-Int4报错 Weight input_size_per_partition = 14784 is not divisible by min_thread_k = 128.

vllm 0.5.0.post1 + Qwen2-72B-Instruct-GPTQ-Int4 正常使用

jklj077 commented 2 days ago

14784*2=29658

The offical Qwen2.5-72B-Instruct-GPTQ-Int4 model should be 29696.

If you have quantized the model yourself, please refer to our documentation: https://qwen.readthedocs.io/zh-cn/latest/quantization/gptq.html#troubleshooting

hyliush commented 2 days ago

Thanks, I was previously using the version g32 from modelscope and the default configuration parameter was 29568


发件人: Ren Xuancheng @.> 发送时间: 2024年9月27日 12:44 收件人: QwenLM/Qwen2.5 @.> 抄送: hyliu @.>; Author @.> 主题: Re: [QwenLM/Qwen2.5] [Badcase]: Qwen2.5-72B-Instruct-GPTQ-Int4 input_size_per_partition (Issue #986)

14784*2=29658

The offical Qwen2.5-72B-Instruct-GPTQ-Int4 model should be 29696https://huggingface.co/Qwen/Qwen2.5-72B-Instruct-GPTQ-Int4/blob/de16ae5d56f73657b43d4cab6c4925600aa6de8d/config.json#L11.

If you have quantized the model yourself, please refer to our documentation: https://qwen.readthedocs.io/zh-cn/latest/quantization/gptq.html#troubleshooting

― Reply to this email directly, view it on GitHubhttps://github.com/QwenLM/Qwen2.5/issues/986#issuecomment-2378389772, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ALXIFUWMQCEL42UNXDW22FDZYTPDJAVCNFSM6AAAAABO5OFKZSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNZYGM4DSNZXGI. You are receiving this because you authored the thread.Message ID: @.***>

YChengxin commented 2 days ago

14784*2=29658

官方的Qwen2.5-72B-Instruct-GPTQ-Int4型号应该是29696

如果你已经自行对模型进行了量化,请参考我们的文档:https ://qwen.readthedocs.io/zh-cn/latest/quantization/gptq.html#troubleshooting

请问padding后只有一个模型文件'/path/to/padded_model/pytorch_model.bin',大小130G+,复制了config.json等非.safetensors结尾的文件过去,但是量化加载该模型的时候会报错是什么原因呢?Traceback (most recent call last): File "/app/gptq_qwen.py", line 65, in model = AutoGPTQForCausalLM.from_pretrained(model_path, quantize_config) File "/usr/local/lib/python3.10/dist-packages/auto_gptq/modeling/auto.py", line 76, in from_pretrained return GPTQ_CAUSAL_LM_MODEL_MAP[model_type].from_pretrained( File "/usr/local/lib/python3.10/dist-packages/auto_gptq/modeling/_base.py", line 787, in from_pretrained model = AutoModelForCausalLM.from_pretrained(pretrained_model_name_or_path, **merged_kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/models/auto/auto_factory.py", line 564, in from_pretrained return model_class.from_pretrained( File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 3960, in from_pretrained ) = cls._load_pretrained_model( File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 4414, in _load_pretrained_model state_dict = load_state_dict(shard_file, is_quantized=is_quantized) File "/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py", line 548, in load_state_dict with safe_open(checkpoint_file, framework="pt") as f: FileNotFoundError: No such file or directory: "/model_cache/padded_model/model-00001-of-00082.safetensors"