GPTQ quant build engines AssertionError

[02/02/2022-06:42:51] [TRT-LLM] [I] Serially build TensorRT engines. [02/02/2022-06:42:52] [TRT] [I] [MemUsageChange] Init CUDA: CPU +267, GPU +0, now: CPU 390, GPU 16638 (MiB) [02/02/2022-06:42:53] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +267, GPU +242, now: CPU 675, GPU 16898 (MiB) [02/02/2022-06:42:53] [TRT-LLM] [W] Invalid timing cache, using freshly created one [02/02/2022-06:43:01] [TRT-LLM] [I] Loading weights from groupwise GPTQ LLaMA safetensors... /usr/local/lib/python3.8/dist-packages/torch/storage.py:315: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. warnings.warn(message, UserWarning) Traceback (most recent call last): File "build.py", line 718, in <module> build(0, args) File "build.py", line 689, in build engine = build_rank_engine(builder, builder_config, engine_name, File "build.py", line 543, in build_rank_engine load_func(tensorrt_llm_llama=tensorrt_llm_llama, File "/workspace/TensorRT-LLM/examples/llama/weight.py", line 1017, in load_from_gptq_llama tensorrt_llm_llama.layers[ File "/usr/local/lib/python3.8/dist-packages/tensorrt_llm/parameter.py", line 66, in value assert v.shape == self._value.shape, \ AssertionError: ('The value updated is not the same shape as the original. ', 'Updated: (68, 3200), original: (67, 3200)') When I completed the GPTQ quantization and built, the above error occurred. The tensorrt llm version I used was 0.5.0.

NVIDIA / TensorRT-LLM

GPTQ quant build engines AssertionError #866