Assertion failed: Can't free tmp workspace for GEMM tactics profiling. When using int4 checkpoint

System Info

CPU architecture: x86_64
CPU memory size: 128G
GPU name: NVIDIA GeForce GTX 1660S
GPU memory size: 6G
TensorRT-LLM branch: main
TensorRT-LLM commit: 9691e12
Container used: tensorrt_llm/release: latest
Version of CUDA: 12.5
OS: Ubuntu 22.04

Who can help?

@Tra

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[X] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

Follow the instructions at https://nvidia.github.io/TensorRT-LLM/installation/build-from-source-linux.html#building-a-tensorrt-llm-docker-image to install tensorrt-llm.
Download the chatglm3_6b model weights from huggingface.
Run python3 convert_checkpoint.py --model_dir chatglm3_6b --use_weight_only --weight_only_precision int4 --output_dir trt_ckpt/chatglm3_6b/int4wo/1-gpu to get a int4 checkpoint.
Run trtllm-build --checkpoint_dir trt_ckpt/chatglm3_6b/int4wo/1-gpu \ --gpt_attention_plugin float16 \ --gemm_plugin float16 \ --max_batch_size 1 \ --max_input_len 512 \ --max_output_len 512 \ --output_dir trt_engines/chatglm3_6b/int4wo/1-gpu to build the engine.

Expected behavior

The engine built successfully.

actual behavior

Get the error: ...... [TensorRT-LLM][WARNING] Have not found any valid GEMM config for shape (m=1, n=2304, k=4096). Will try to use default or fail at runtime [TensorRT-LLM][WARNING] Have not found any valid GEMM config for shape (m=2, n=2304, k=4096). Will try to use default or fail at runtime [TensorRT-LLM][WARNING] Have not found any valid GEMM config for shape (m=4, n=2304, k=4096). Will try to use default or fail at runtime [TensorRT-LLM][WARNING] Have not found any valid GEMM config for shape (m=8, n=2304, k=4096). Will try to use default or fail at runtime [TensorRT-LLM][WARNING] Have not found any valid GEMM config for shape (m=16, n=2304, k=4096). Will try to use default or fail at runtime [TensorRT-LLM][WARNING] Have not found any valid GEMM config for shape (m=32, n=2304, k=4096). Will try to use default or fail at runtime [TensorRT-LLM][WARNING] Have not found any valid GEMM config for shape (m=64, n=2304, k=4096). Will try to use default or fail at runtime [TensorRT-LLM][WARNING] Have not found any valid GEMM config for shape (m=128, n=2304, k=4096). Will try to use default or fail at runtime [TensorRT-LLM][WARNING] Have not found any valid GEMM config for shape (m=256, n=2304, k=4096). Will try to use default or fail at runtime [TensorRT-LLM][WARNING] Have not found any valid GEMM config for shape (m=512, n=2304, k=4096). Will try to use default or fail at runtime [TensorRT-LLM][WARNING] Have not found any valid GEMM config for shape (m=1024, n=2304, k=4096). Will try to use default or fail at runtime terminate called after throwing an instance of 'tensorrt_llm::common::TllmException' what(): [TensorRT-LLM][ERROR] Assertion failed: Can't free tmp workspace for GEMM tactics profiling. (/src/tensorrt_llm/cpp/tensorrt_llm/plugins/common/gemmPluginProfiler.cpp:204) 1 0x7f67c5319622 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libnvinfer_plugin_tensorrt_llm.so(+0x7f622) [0x7f67c5319622] 2 0x7f67c53f8058 tensorrt_llm::plugins::GemmPluginProfiler<tensorrt_llm::cutlass_extensions::CutlassGemmConfig, std::shared_ptr, tensorrt_llm::plugins::GemmIdCore, tensorrt_llm::plugins::GemmIdCoreHash>::freeTmpData() + 104 3 0x7f67c5403f2b tensorrt_llm::plugins::GemmPluginProfiler<tensorrt_llm::cutlass_extensions::CutlassGemmConfig, std::shared_ptr, tensorrt_llm::plugins::GemmIdCore, tensorrt_llm::plugins::GemmIdCoreHash>::profileTactics(std::shared_ptr const&, nvinfer1::DataType const&, tensorrt_llm::plugins::GemmDims const&, tensorrt_llm::plugins::GemmIdCore const&) + 1131 4 0x7f67c53d72ad tensorrt_llm::plugins::WeightOnlyQuantMatmulPlugin::initialize() + 13 5 0x7f69c93bc6e5 /usr/local/tensorrt/lib/libnvinfer.so.10(+0x108c6e5) [0x7f69c93bc6e5] 6 0x7f69c9349de2 /usr/local/tensorrt/lib/libnvinfer.so.10(+0x1019de2) [0x7f69c9349de2] 7 0x7f69c913456c /usr/local/tensorrt/lib/libnvinfer.so.10(+0xe0456c) [0x7f69c913456c] 8 0x7f69c913621c /usr/local/tensorrt/lib/libnvinfer.so.10(+0xe0621c) [0x7f69c913621c] 9 0x7f69c9138328 /usr/local/tensorrt/lib/libnvinfer.so.10(+0xe08328) [0x7f69c9138328] 10 0x7f69c8d872ac /usr/local/tensorrt/lib/libnvinfer.so.10(+0xa572ac) [0x7f69c8d872ac] 11 0x7f69c8d8c501 /usr/local/tensorrt/lib/libnvinfer.so.10(+0xa5c501) [0x7f69c8d8c501] 12 0x7f69c8d8cf0b /usr/local/tensorrt/lib/libnvinfer.so.10(+0xa5cf0b) [0x7f69c8d8cf0b] 13 0x7f69d70a7458 /usr/local/lib/python3.10/dist-packages/tensorrt/tensorrt.so(+0xa7458) [0x7f69d70a7458] 14 0x7f69d70458f3 /usr/local/lib/python3.10/dist-packages/tensorrt/tensorrt.so(+0x458f3) [0x7f69d70458f3] 15 0x562b59fef10e /usr/bin/python(+0x15a10e) [0x562b59fef10e] 16 0x562b59fe5a7b _PyObject_MakeTpCall + 603 17 0x562b59ffdacb /usr/bin/python(+0x168acb) [0x562b59ffdacb] 18 0x562b59fddcfa _PyEval_EvalFrameDefault + 24906 19 0x562b59fef9fc _PyFunction_Vectorcall + 124 20 0x562b59fda5d7 _PyEval_EvalFrameDefault + 10791 21 0x562b59fef9fc _PyFunction_Vectorcall + 124 22 0x562b59fd845c _PyEval_EvalFrameDefault + 2220 23 0x562b59fef9fc _PyFunction_Vectorcall + 124 24 0x562b59fd826d _PyEval_EvalFrameDefault + 1725 25 0x562b59fef9fc _PyFunction_Vectorcall + 124 26 0x562b59ffe492 PyObject_Call + 290 27 0x562b59fda5d7 _PyEval_EvalFrameDefault + 10791 28 0x562b59fef9fc _PyFunction_Vectorcall + 124 29 0x562b59ffe492 PyObject_Call + 290 30 0x562b59fda5d7 _PyEval_EvalFrameDefault + 10791 31 0x562b59fef9fc _PyFunction_Vectorcall + 124 32 0x562b59ffe492 PyObject_Call + 290 33 0x562b59fda5d7 _PyEval_EvalFrameDefault + 10791 34 0x562b59fef9fc _PyFunction_Vectorcall + 124 35 0x562b59fd826d _PyEval_EvalFrameDefault + 1725 36 0x562b59fd49c6 /usr/bin/python(+0x13f9c6) [0x562b59fd49c6] 37 0x562b5a0ca256 PyEval_EvalCode + 134 38 0x562b5a0f5108 /usr/bin/python(+0x260108) [0x562b5a0f5108] 39 0x562b5a0ee9cb /usr/bin/python(+0x2599cb) [0x562b5a0ee9cb] 40 0x562b5a0f4e55 /usr/bin/python(+0x25fe55) [0x562b5a0f4e55] 41 0x562b5a0f4338 _PyRun_SimpleFileObject + 424 42 0x562b5a0f3f83 _PyRun_AnyFileObject + 67 43 0x562b5a0e6a5e Py_RunMain + 702 44 0x562b5a0bd02d Py_BytesMain + 45 45 0x7f6a1d2e1d90 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f6a1d2e1d90] 46 0x7f6a1d2e1e40 libc_start_main + 128 47 0x562b5a0bcf25 _start + 37 [alpha-release:06844] Process received signal [alpha-release:06844] Signal: Aborted (6) [alpha-release:06844] Signal code: (-6) [alpha-release:06844] [ 0] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f6a1d2fa520] [alpha-release:06844] [ 1] /usr/lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c)[0x7f6a1d34e9fc] [alpha-release:06844] [ 2] /usr/lib/x86_64-linux-gnu/libc.so.6(raise+0x16)[0x7f6a1d2fa476] [alpha-release:06844] [ 3] /usr/lib/x86_64-linux-gnu/libc.so.6(abort+0xd3)[0x7f6a1d2e07f3] [alpha-release:06844] [ 4] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xa2b9e)[0x7f69c5a4fb9e] [alpha-release:06844] [ 5] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae20c)[0x7f69c5a5b20c] [alpha-release:06844] [ 6] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xad1e9)[0x7f69c5a5a1e9] [alpha-release:06844] [ 7] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(gxx_personality_v0+0x99)[0x7f69c5a5a959] [alpha-release:06844] [ 8] /usr/lib/x86_64-linux-gnu/libgcc_s.so.1(+0x16884)[0x7f6a1cdcb884] [alpha-release:06844] [ 9] /usr/lib/x86_64-linux-gnu/libgcc_s.so.1(_Unwind_Resume+0x12d)[0x7f6a1cdcc2dd] [alpha-release:06844] [10] /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libnvinfer_plugin_tensorrt_llm.so(_ZN12tensorrt_llm7plugins18GemmPluginProfilerINS_18cutlass_extensions17CutlassGemmConfigESt10shared_ptrINS_7kernels15cutlass_kernels33CutlassFpAIntBGemmRunnerInterfaceEENS0_10GemmIdCoreENS0_14GemmIdCoreHashEE14profileTacticsERKS8_RKN8nvinfer18DataTypeERKNS08GemmDimsERKS9+0x781)[0x7f67c5404241] [alpha-release:06844] [11] /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libnvinfer_plugin_tensorrt_llm.so(_ZN12tensorrt_llm7plugins27WeightOnlyQuantMatmulPlugin10initializeEv+0xd)[0x7f67c53d72ad] [alpha-release:06844] [12] /usr/local/tensorrt/lib/libnvinfer.so.10(+0x108c6e5)[0x7f69c93bc6e5] [alpha-release:06844] [13] /usr/local/tensorrt/lib/libnvinfer.so.10(+0x1019de2)[0x7f69c9349de2] [alpha-release:06844] [14] /usr/local/tensorrt/lib/libnvinfer.so.10(+0xe0456c)[0x7f69c913456c] [alpha-release:06844] [15] /usr/local/tensorrt/lib/libnvinfer.so.10(+0xe0621c)[0x7f69c913621c] [alpha-release:06844] [16] /usr/local/tensorrt/lib/libnvinfer.so.10(+0xe08328)[0x7f69c9138328] [alpha-release:06844] [17] /usr/local/tensorrt/lib/libnvinfer.so.10(+0xa572ac)[0x7f69c8d872ac] [alpha-release:06844] [18] /usr/local/tensorrt/lib/libnvinfer.so.10(+0xa5c501)[0x7f69c8d8c501] [alpha-release:06844] [19] /usr/local/tensorrt/lib/libnvinfer.so.10(+0xa5cf0b)[0x7f69c8d8cf0b] [alpha-release:06844] [20] /usr/local/lib/python3.10/dist-packages/tensorrt/tensorrt.so(+0xa7458)[0x7f69d70a7458] [alpha-release:06844] [21] /usr/local/lib/python3.10/dist-packages/tensorrt/tensorrt.so(+0x458f3)[0x7f69d70458f3] [alpha-release:06844] [22] /usr/bin/python(+0x15a10e)[0x562b59fef10e] [alpha-release:06844] [23] /usr/bin/python(_PyObject_MakeTpCall+0x25b)[0x562b59fe5a7b] [alpha-release:06844] [24] /usr/bin/python(+0x168acb)[0x562b59ffdacb] [alpha-release:06844] [25] /usr/bin/python(_PyEval_EvalFrameDefault+0x614a)[0x562b59fddcfa] [alpha-release:06844] [26] /usr/bin/python(_PyFunction_Vectorcall+0x7c)[0x562b59fef9fc] [alpha-release:06844] [27] /usr/bin/python(_PyEval_EvalFrameDefault+0x2a27)[0x562b59fda5d7] [alpha-release:06844] [28] /usr/bin/python(_PyFunction_Vectorcall+0x7c)[0x562b59fef9fc] [alpha-release:06844] [29] /usr/bin/python(_PyEval_EvalFrameDefault+0x8ac)[0x562b59fd845c] [alpha-release:06844] End of error message Aborted (core dumped)

additional notes

I choose to use int4 because I only have 6G GPU memory, and int8 will cause about 6.4G memory.

NVIDIA / TensorRT-LLM