ModelCloud / GPTQModel

Production ready LLM model compression/quantization toolkit with accelerated inference support for both cpu/gpu via HF, vLLM, and SGLang.
Apache License 2.0
122 stars 26 forks source link

[BUG] Some examples are broken with `ValueError: Expected a 1-dimensional tensor for 'input_ids', but got a tensor with 2 dimensions.` #306

Closed vhain closed 3 months ago

vhain commented 3 months ago

Describe the bug

basic_usage_wikitext2.py example is broken.

examples/quantization/basic_usage_wikitext2.py#L36:

        inp = trainenc.input_ids[:, i:j]

is colliding with dimension assertion in gptqmodel/models/base.py#L216-L219:

                if input_ids.dim() == 1:
                    input_ids_length = input_ids.shape[0]
                else:
                    raise ValueError("Expected a 1-dimensional tensor for 'input_ids', but got a tensor with {0} dimensions.".format(input_ids.dim()))

and raising exception:

ValueError: Expected a 1-dimensional tensor for 'input_ids', but got a tensor with 2 dimensions.

I think the example should be fixed to:

-        inp = trainenc.input_ids[:, i:j]
+        inp = trainenc.input_ids[0, i:j]

GPU Info

unrelated but:

$ nvidia-smi
Sat Jul 27 05:23:24 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03             Driver Version: 535.129.03   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA H100 80GB HBM3          On  | 00000000:61:00.0 Off |                    0 |
| N/A   53C    P0             615W / 700W |  11471MiB / 81559MiB |     89%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA H100 80GB HBM3          On  | 00000000:62:00.0 Off |                    0 |
| N/A   33C    P0              70W / 700W |      4MiB / 81559MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   2  NVIDIA H100 80GB HBM3          On  | 00000000:63:00.0 Off |                    0 |
| N/A   25C    P0              70W / 700W |      4MiB / 81559MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   3  NVIDIA H100 80GB HBM3          On  | 00000000:64:00.0 Off |                    0 |
| N/A   25C    P0              72W / 700W |      4MiB / 81559MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   4  NVIDIA H100 80GB HBM3          On  | 00000000:6A:00.0 Off |                    0 |
| N/A   24C    P0              68W / 700W |      4MiB / 81559MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   5  NVIDIA H100 80GB HBM3          On  | 00000000:6B:00.0 Off |                    0 |
| N/A   25C    P0              74W / 700W |      4MiB / 81559MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   6  NVIDIA H100 80GB HBM3          On  | 00000000:6C:00.0 Off |                    0 |
| N/A   25C    P0              67W / 700W |      4MiB / 81559MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   7  NVIDIA H100 80GB HBM3          On  | 00000000:6D:00.0 Off |                    0 |
| N/A   23C    P0              68W / 700W |      4MiB / 81559MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A    273930      C   python                                    11458MiB |
+---------------------------------------------------------------------------------------+

Software Info

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 22.04.3 LTS
Release:    22.04
Codename:   jammy

$ python --version
Python 3.10.12
$ pip show gptqmodel torch transformers accelerate triton
Name: gptqmodel
Version: 0.9.10.dev0+cu121
Summary: A LLM quantization package with user-friendly apis. Based on GPTQ algorithm.
Home-page: https://github.com/ModelCloud/GPTQModel
Author: ModelCloud
Author-email:
License: UNKNOWN
Location: /home/ubuntu/GPTQModel/.venv/lib/python3.10/site-packages
Requires: accelerate, auto-round, datasets, gekko, huggingface-hub, intel_extension_for_transformers, ninja, numpy, packaging, protobuf, rouge, safetensors, sentencepiece, threadpoolctl, torch, tqdm, transformers, triton
Required-by:
---
Name: torch
Version: 2.4.0
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: packages@pytorch.org
License: BSD-3
Location: /home/ubuntu/GPTQModel/.venv/lib/python3.10/site-packages
Requires: filelock, fsspec, jinja2, networkx, nvidia-cublas-cu12, nvidia-cuda-cupti-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-runtime-cu12, nvidia-cudnn-cu12, nvidia-cufft-cu12, nvidia-curand-cu12, nvidia-cusolver-cu12, nvidia-cusparse-cu12, nvidia-nccl-cu12, nvidia-nvtx-cu12, sympy, triton, typing-extensions
Required-by: accelerate, auto-round, auto_gptq, gptqmodel, peft
---
Name: transformers
Version: 4.43.3
Summary: State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow
Home-page: https://github.com/huggingface/transformers
Author: The Hugging Face team (past and future) with the help of all our contributors (https://github.com/huggingface/transformers/graphs/contributors)
Author-email: transformers@huggingface.co
License: Apache 2.0 License
Location: /home/ubuntu/GPTQModel/.venv/lib/python3.10/site-packages
Requires: filelock, huggingface-hub, numpy, packaging, pyyaml, regex, requests, safetensors, tokenizers, tqdm
Required-by: auto-round, auto_gptq, gptqmodel, intel-extension-for-transformers, peft
---
Name: accelerate
Version: 0.33.0
Summary: Accelerate
Home-page: https://github.com/huggingface/accelerate
Author: The HuggingFace team
Author-email: zach.mueller@huggingface.co
License: Apache
Location: /home/ubuntu/GPTQModel/.venv/lib/python3.10/site-packages
Requires: huggingface-hub, numpy, packaging, psutil, pyyaml, safetensors, torch
Required-by: auto-round, auto_gptq, gptqmodel, peft
---
Name: triton
Version: 3.0.0
Summary: A language and compiler for custom Deep Learning operations
Home-page: https://github.com/triton-lang/triton/
Author: Philippe Tillet
Author-email: phil@openai.com
License:
Location: /home/ubuntu/GPTQModel/.venv/lib/python3.10/site-packages
Requires: filelock
Required-by: gptqmodel, torch

To Reproduce

$ python examples/quantization/basic_usage_wikitext2.py

Expected behavior

Does not raise ValueError exception.

Model/Datasets

are hardcoded in the example.

Screenshots

$ CUDA_VISIBLE_DEVICES=1 python examples/quantization/basic_usage_wikitext2.py
/home/ubuntu/GPTQModel/.venv/lib/python3.10/site-packages/gptqmodel/nn_modules/triton_utils/dequant.py:123: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
  def forward(ctx, input, qweight, scales, qzeros, g_idx, bits, maxq):
/home/ubuntu/GPTQModel/.venv/lib/python3.10/site-packages/gptqmodel/nn_modules/triton_utils/dequant.py:131: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
  def backward(ctx, grad_output):
tokenizer_config.json: 100%|██████████████████████████████████████████████████████| 685/685 [00:00<00:00, 8.05MB/s]
config.json: 100%|████████████████████████████████████████████████████████████████| 651/651 [00:00<00:00, 9.10MB/s]
vocab.json: 100%|███████████████████████████████████████████████████████████████| 899k/899k [00:00<00:00, 7.29MB/s]
merges.txt: 100%|███████████████████████████████████████████████████████████████| 456k/456k [00:00<00:00, 13.1MB/s]
special_tokens_map.json: 100%|████████████████████████████████████████████████████| 441/441 [00:00<00:00, 6.31MB/s]
pytorch_model.bin: 100%|█████████████████████████████████████████████████████████| 251M/251M [00:00<00:00, 422MB/s]
generation_config.json: 100%|█████████████████████████████████████████████████████| 137/137 [00:00<00:00, 1.89MB/s]
WARNING - Calibration dataset size should be greater than 256. Current size: 128.
Traceback (most recent call last):
  File "/home/ubuntu/GPTQModel/examples/quantization/basic_usage_wikitext2.py", line 171, in <module>
    main()
  File "/home/ubuntu/GPTQModel/examples/quantization/basic_usage_wikitext2.py", line 148, in main
    model.quantize(traindataset)
  File "/home/ubuntu/GPTQModel/.venv/lib/python3.10/site-packages/gptqmodel/models/base.py", line 219, in quantize
    raise ValueError("Expected a 1-dimensional tensor for 'input_ids', but got a tensor with {0} dimensions.".format(input_ids.dim()))
ValueError: Expected a 1-dimensional tensor for 'input_ids', but got a tensor with 2 dimensions.

Additional context

Seems like the example was taken directly from AutoGPTQ. However seems like AutoGPTQ is capable of handling 2 dimensional tensor, while GPTQModel asserts 1 dimensional tensor.

Examples should generate 1 dimensional tensor as input_ids to be correctly passed over to model.quantize().

Qubitium commented 3 months ago

Thanks for finding the issue. We will fix the examples so they are correct as relates ro gptqmodel.