NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.19k stars 908 forks source link

GPTQ quant build engines AssertionError #866

Open jpf888 opened 8 months ago

jpf888 commented 8 months ago

[02/02/2022-06:42:51] [TRT-LLM] [I] Serially build TensorRT engines. [02/02/2022-06:42:52] [TRT] [I] [MemUsageChange] Init CUDA: CPU +267, GPU +0, now: CPU 390, GPU 16638 (MiB) [02/02/2022-06:42:53] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +267, GPU +242, now: CPU 675, GPU 16898 (MiB) [02/02/2022-06:42:53] [TRT-LLM] [W] Invalid timing cache, using freshly created one [02/02/2022-06:43:01] [TRT-LLM] [I] Loading weights from groupwise GPTQ LLaMA safetensors... /usr/local/lib/python3.8/dist-packages/torch/storage.py:315: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. warnings.warn(message, UserWarning) Traceback (most recent call last): File "build.py", line 718, in <module> build(0, args) File "build.py", line 689, in build engine = build_rank_engine(builder, builder_config, engine_name, File "build.py", line 543, in build_rank_engine load_func(tensorrt_llm_llama=tensorrt_llm_llama, File "/workspace/TensorRT-LLM/examples/llama/weight.py", line 1017, in load_from_gptq_llama tensorrt_llm_llama.layers[ File "/usr/local/lib/python3.8/dist-packages/tensorrt_llm/parameter.py", line 66, in value assert v.shape == self._value.shape, \ AssertionError: ('The value updated is not the same shape as the original. ', 'Updated: (68, 3200), original: (67, 3200)') When I completed the GPTQ quantization and built, the above error occurred. The tensorrt llm version I used was 0.5.0.

YooSungHyun commented 7 months ago

how do you input scripts?