Cannot copy out of meta tensor; no data error while quantizing awq llama 3 8b mode

System Info

Getting this error while trying to quantize llama 3 8b model with tensorrt_llm 0.9.0

GPU A10, 24 GB Docker : 23.10-trtllm-python-py3

ref : https://github.com/NVIDIA/TensorRT-LLM/issues/1182

Who can help?

@byshiue @Tracin

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[X] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

python3 TensorRT-LLM/examples/quantization/quantize.py --model_dir model \ --output_dir tllm_checkpoint_1gpu_awq \ --dtype float16 \ --qformat int4_awq \ --awq_block_size 128

Expected behavior

Quantized model as output

actual behavior

Error while quantizing

[TensorRT-LLM][WARNING] The manually set model data type is torch.float16, but the data type of the HuggingFace model is torch.float32. Initializing tokenizer from model Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.

AWQ calibration could take longer than other calibration methods. Please increase the batch size to speed up the calibration process. Batch size can be set by adding the argument --batch_size to the command line.

Loading calibration dataset [NeMo W 2024-07-31 06:37:10 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/datasets/table.py:1421: FutureWarning: promote has been superseded by promote_options='default'. table = cls._concat_blocks(blocks, axis=0)

{'quant_cfg': {'weight_quantizer': {'num_bits': 4, 'block_sizes': {-1: 128}, 'enable': True}, 'input_quantizer': {'enable': False}, 'lm_head': {'enable': False}, 'output_layer': {'enable': False}, 'default': {'enable': False}, '.query_key_value.output_quantizer': {'num_bits': 8, 'axis': None, 'enable': True}, '.Wqkv.output_quantizer': {'num_bits': 8, 'axis': None, 'enable': True}, '.W_pack.output_quantizer': {'num_bits': 8, 'axis': None, 'enable': True}, '.c_attn.output_quantizer': {'num_bits': 8, 'axis': None, 'enable': True}, '.k_proj.output_quantizer': {'num_bits': 8, 'axis': None, 'enable': True}, '.v_proj.output_quantizer': {'num_bits': 8, 'axis': None, 'enable': True}}, 'algorithm': {'method': 'awq_lite', 'alpha_step': 0.1}} Starting quantization... Replaced 675 modules to quantized modules Caching activation statistics for awq_lite... Calibrating batch 0 Loading extension ammo_cuda_ext... Loading extension ammo_cuda_ext_fp8... Calibrating batch 1 Calibrating batch 2 Calibrating batch 3 Calibrating batch 4 Calibrating batch 5 Calibrating batch 6 Calibrating batch 7 Calibrating batch 8 Calibrating batch 9 Calibrating batch 10 Calibrating batch 11 Calibrating batch 12 Calibrating batch 13 Calibrating batch 14 Calibrating batch 15 Calibrating batch 16 Calibrating batch 17 Calibrating batch 18 Calibrating batch 19 Calibrating batch 20 Calibrating batch 21 Calibrating batch 22 Calibrating batch 23 Calibrating batch 24 Calibrating batch 25 Calibrating batch 26 Calibrating batch 27 Calibrating batch 28 Calibrating batch 29 Calibrating batch 30 Calibrating batch 31 Searching awq_lite parameters... Calibrating batch 0 [NeMo W 2024-07-31 06:38:14 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/ammo/torch/quantization/nn/modules/tensor_quantizer.py:153: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requiresgrad(True), rather than torch.tensor(sourceTensor). self.register_buffer("_pre_quant_scale", torch.tensor(value))

[NeMo W 2024-07-31 06:38:15 nemo_logging:349] /usr/local/lib/python3.10/dist-packages/ammo/torch/quantization/nn/modules/tensor_quantizer.py:155: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requiresgrad(True), rather than torch.tensor(sourceTensor). value = torch.tensor(value, device=self._pre_quant_scale.device)

Traceback (most recent call last): File "/app/TensorRT-LLM/examples/quantization/quantize.py", line 364, in main(args) File "/app/TensorRT-LLM/examples/quantization/quantize.py", line 284, in main model = quantize_model(model, quant_cfg, calib_dataloader) File "/app/TensorRT-LLM/examples/quantization/quantize.py", line 221, in quantize_model atq.quantize(model, quant_cfg, forward_loop=calibrate_loop) File "/usr/local/lib/python3.10/dist-packages/ammo/torch/quantization/model_quant.py", line 112, in quantize calibrate(model, config["algorithm"], forward_loop=forward_loop) File "ammo/torch/quantization/model_calib.py", line 59, in ammo.torch.quantization.model_calib.calibrate File "ammo/torch/quantization/model_calib.py", line 185, in ammo.torch.quantization.model_calib.awq File "ammo/torch/quantization/model_calib.py", line 187, in ammo.torch.quantization.model_calib.awq File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, kwargs) File "ammo/torch/quantization/model_calib.py", line 330, in ammo.torch.quantization.model_calib.awq_lite File "/app/TensorRT-LLM/examples/quantization/quantize.py", line 217, in calibrate_loop model(data) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 165, in new_forward output = module._old_forward(*args, kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 1181, in forward outputs = self.model( File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 1068, in forward layer_outputs = decoder_layer( File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 165, in new_forward output = module._old_forward(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 796, in forward hidden_states, self_attn_weights, present_key_value = self.self_attn( File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(*args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 165, in new_forward output = module._old_forward(args, kwargs) File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 386, in forward query_states = self.q_proj(hidden_states) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl return self._call_impl(*args, *kwargs) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1527, in _call_impl return forward_call(args, **kwargs) File "ammo/torch/quantization/model_calib.py", line 294, in ammo.torch.quantization.model_calib.awq_lite.forward NotImplementedError: Cannot copy out of meta tensor; no data!

additional notes

python pacakges


Package                       Version
----------------------------- -------------------
absl-py                       2.1.0
accelerate                    0.25.0
aiohttp                       3.9.5
aiosignal                     1.3.1
alabaster                     0.7.16
aniso8601                     9.0.1
antlr4-python3-runtime        4.9.3
appdirs                       1.4.4
asttokens                     2.4.1
async-timeout                 4.0.3
attrdict                      2.0.1
attrs                         23.2.0
audioread                     3.0.1
Babel                         2.15.0
beautifulsoup4                4.12.3
black                         19.10b0
blinker                       1.4
boto3                         1.34.149
botocore                      1.34.149
braceexpand                   0.1.7
build                         1.2.1
cdifflib                      1.2.6
certifi                       2023.7.22
cffi                          1.16.0
charset-normalizer            3.3.0
click                         8.0.2
colorama                      0.4.6
colored                       2.2.4
coloredlogs                   15.0.1
comm                          0.2.2
contourpy                     1.2.1
cryptography                  41.0.7
cuda-python                   12.5.0
cycler                        0.12.1
Cython                        3.0.10
datasets                      2.14.6
dbus-python                   1.2.18
decorator                     5.1.1
diffusers                     0.15.0
dill                          0.3.7
Distance                      0.1.3
distro                        1.7.0
docker-pycreds                0.4.0
docopt                        0.6.2
docutils                      0.21.2
editdistance                  0.8.1
einops                        0.8.0
evaluate                      0.4.2
exceptiongroup                1.2.2
executing                     2.0.1
faiss-cpu                     1.8.0.post1
fasttext                      0.9.3
filelock                      3.12.4
Flask                         2.2.5
Flask-Cors                    3.0.10
Flask-OpenTracing             1.1.0
Flask-RESTful                 0.3.10
flatbuffers                   24.3.25
fonttools                     4.53.1
frozenlist                    1.4.1
fsspec                        2023.9.2
ftfy                          6.2.0
g2p-en                        2.1.0
gdown                         5.2.0
gitdb                         4.0.11
GitPython                     3.1.43
grpcio                        1.65.1
grpcio-opentracing            1.1.4
grpcio-reflection             1.48.2
gunicorn                      20.1.0
h5py                          3.11.0
httplib2                      0.20.2
huggingface-hub               0.24.2
humanfriendly                 10.0
hydra-core                    1.2.0
idna                          3.4
ijson                         3.3.0
imagesize                     1.4.1
importlib-metadata            4.6.4
inflect                       7.3.1
iniconfig                     2.0.0
ipadic                        1.0.0
ipython                       8.26.0
ipywidgets                    8.1.3
isort                         5.13.2
itsdangerous                  2.2.0
jaeger-client                 4.4.0
janus                         1.0.0
jedi                          0.19.1
jeepney                       0.7.1
jieba                         0.42.1
Jinja2                        3.1.2
jiwer                         2.5.2
jmespath                      1.0.1
joblib                        1.4.2
jsonschema                    3.2.0
jupyterlab_widgets            3.0.11
kaldi-python-io               1.2.2
kaldiio                       2.18.0
keyring                       23.5.0
kiwisolver                    1.4.5
kornia                        0.7.3
kornia_rs                     0.1.5
lark                          1.1.9
latexcodec                    3.0.0
launchpadlib                  1.10.16
lazr.restfulclient            0.14.4
lazr.uri                      1.0.6
lazy_loader                   0.4
Levenshtein                   0.22.0
librosa                       0.10.2.post1
lightning-utilities           0.11.6
llvmlite                      0.43.0
loguru                        0.7.2
lxml                          5.2.2
Markdown                      3.6
markdown-it-py                3.0.0
markdown2                     2.5.0
MarkupSafe                    2.1.3
marshmallow                   3.21.3
matplotlib                    3.9.1
matplotlib-inline             0.1.7
mdurl                         0.1.2
mecab-python3                 1.0.6
megatron-core                 0.2.0
more-itertools                8.10.0
mpi4py                        3.1.6
mpmath                        1.3.0
msgpack                       1.0.8
multidict                     6.0.5
multiprocess                  0.70.15
nemo_text_processing          1.0.2
nemo-toolkit                  1.20.0
networkx                      3.2
ninja                         1.11.1.1
nltk                          3.8.1
numba                         0.60.0
numpy                         1.23.5
nvidia-ammo                   0.7.4
nvidia-cublas-cu12            12.1.3.1
nvidia-cuda-cupti-cu12        12.1.105
nvidia-cuda-nvrtc-cu12        12.1.105
nvidia-cuda-runtime-cu12      12.1.105
nvidia-cudnn-cu12             8.9.2.26
nvidia-cufft-cu12             11.0.2.54
nvidia-curand-cu12            10.3.2.106
nvidia-cusolver-cu12          11.4.5.107
nvidia-cusparse-cu12          12.1.0.106
nvidia-nccl-cu12              2.18.1
nvidia-nvjitlink-cu12         12.5.82
nvidia-nvtx-cu12              12.1.105
oauthlib                      3.2.0
omegaconf                     2.2.3
onnx                          1.16.1
onnx-graphsurgeon             0.5.2
onnxruntime                   1.16.3
OpenCC                        1.1.7
opentracing                   2.4.0
optimum                       1.21.2
packaging                     23.2
pandas                        2.2.2
pangu                         4.0.6.1
parameterized                 0.9.0
parso                         0.8.4
pathspec                      0.12.1
pexpect                       4.9.0
pillow                        10.4.0
pip                           23.3
plac                          1.4.3
platformdirs                  4.2.2
pluggy                        1.5.0
polygraphy                    0.49.9
pooch                         1.8.2
portalocker                   2.10.1
progress                      1.6
prometheus-client             0.8.0
prompt_toolkit                3.0.47
protobuf                      3.20.3
psutil                        6.0.0
ptyprocess                    0.7.0
pure_eval                     0.2.3
pyannote.core                 5.0.0
pyannote.database             5.1.0
pyannote.metrics              3.2.1
pyarrow                       17.0.0
pyarrow-hotfix                0.6
pybind11                      2.13.1
pybtex                        0.24.0
pybtex-docutils               1.0.3
pycparser                     2.22
pydantic                      1.10.17
pydub                         0.25.1
Pygments                      2.18.0
PyGObject                     3.42.1
PyJWT                         2.3.0
pynini                        2.1.5
pynvml                        11.5.3
pyparsing                     2.4.7
pypinyin                      0.51.0
pypinyin-dict                 0.8.0
pyproject_hooks               1.1.0
pyrsistent                    0.20.0
PySocks                       1.7.1
pytest                        8.3.2
pytest-runner                 6.0.1
python-apt                    2.4.0+ubuntu2
python-dateutil               2.9.0.post0
pytorch-lightning             1.9.4
pytz                          2024.1
PyYAML                        6.0.1
rapidfuzz                     2.13.7
regex                         2023.10.3
requests                      2.32.3
rich                          13.7.1
rouge-score                   0.1.2
ruamel.yaml                   0.18.6
ruamel.yaml.clib              0.2.8
s3transfer                    0.10.2
sacrebleu                     2.4.2
sacremoses                    0.1.1
safetensors                   0.4.0
scikit-learn                  1.5.1
scipy                         1.14.0
SecretStorage                 3.3.1
seldon-core                   1.18.2
sentence-transformers         3.0.1
sentencepiece                 0.1.99
sentry-sdk                    2.11.0
setproctitle                  1.3.3
setuptools                    65.5.1
shellingham                   1.5.4
six                           1.16.0
smmap                         5.0.1
snowballstemmer               2.2.0
sortedcontainers              2.4.0
soundfile                     0.12.1
soupsieve                     2.5
sox                           1.5.0
soxr                          0.4.0
Sphinx                        7.4.7
sphinxcontrib-applehelp       1.0.8
sphinxcontrib-bibtex          2.6.2
sphinxcontrib-devhelp         1.0.6
sphinxcontrib-htmlhelp        2.0.6
sphinxcontrib-jsmath          1.0.1
sphinxcontrib-qthelp          1.0.8
sphinxcontrib-serializinghtml 1.1.10
stack-data                    0.6.3
sympy                         1.12
tabulate                      0.9.0
tensorboard                   2.17.0
tensorboard-data-server       0.7.2
tensorrt                      9.2.0.post12.dev5
tensorrt-bindings             9.2.0.post12.dev5
tensorrt-libs                 9.2.0.post12.dev5
tensorrt-llm                  0.9.0.dev2024022700
termcolor                     2.4.0
text-unidecode                1.3
textdistance                  4.6.3
texterrors                    0.5.1
threadloop                    1.0.2
threadpoolctl                 3.5.0
thrift                        0.20.0
tiktoken                      0.7.0
tokenizers                    0.15.2
toml                          0.10.2
tomli                         2.0.1
torch                         2.1.0
torchmetrics                  1.4.0.post0
tornado                       6.4.1
tqdm                          4.66.4
traitlets                     5.14.3
transformers                  4.36.1
transformers-stream-generator 0.0.4
triton                        2.1.0
typed-ast                     1.5.5
typeguard                     4.3.0
typer                         0.12.3
typing_extensions             4.12.2
tzdata                        2024.1
urllib3                       1.26.19
wadllib                       1.3.6
wandb                         0.17.5
wcwidth                       0.2.13
webdataset                    0.1.62
Werkzeug                      2.2.3
wget                          3.2
wheel                         0.41.2
widgetsnbextension            4.0.11
wrapt                         1.16.0
xxhash                        3.4.1
yarl                          1.9.4
youtokentome                  1.0.6
zipp                          1.0.0

NVIDIA / TensorRT-LLM

Cannot copy out of meta tensor; no data error while quantizing awq llama 3 8b mode #2063