NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.19k stars 908 forks source link

Llava 1.5 7B TRT LLM conversion fails on AWS g4 instances #1665

Open Tchaikovic opened 3 months ago

Tchaikovic commented 3 months ago

These are the commands I run:

export MODEL_NAME="llava-1.5-7b-hf"
python3 TensorRT-LLM/examples/llama/convert_checkpoint.py \
    --model_dir models/${MODEL_NAME} \
    --output_dir models/trt_${MODEL_NAME}/fp16/1-gpu \
    --dtype float16 \
    --use_weight_only \
    --weight_only_precision int4

I get this error

image

Is this because there is not enough GPU memory? Any workaround for this?

aliencaocao commented 3 months ago

yes not enough vram. generally anything u see "meta tensor" means ur tensor is offloaded to ram and thus not on vram

nv-guomingz commented 3 months ago

try to use flag --load_model_on_cpu

Tchaikovic commented 3 months ago

@nv-guomingz Thank you! Can I run inference on gpu if I use that flag?

nv-guomingz commented 3 months ago

@nv-guomingz Thank you! Can I run inference on gpu if I use that flag?

Yes. Please have a try to see if the issue still exists.

Tchaikovic commented 3 months ago

@nv-guomingz that issue is now resolved. I had to switch to TensorRT-LLM version: 0.11.0.dev2024052800 and TensorRT-LLM commit f430a4b447ef4cba22698902d43eae0debf08594 .

Now I get another error though at the second step when I run

trtllm-build \
    --checkpoint_dir models/trt_${MODEL_NAME}/fp16/1-gpu \
    --output_dir trt_engines/${MODEL_NAME}/int4_weightonly/1-gpu \
    --gpt_attention_plugin float16 \
    --gemm_plugin float16 \
    --max_batch_size 1 \
    --max_input_len 924 \
    --max_output_len 100 \
    --max_multimodal_len 576

Error log

  what():  [TensorRT-LLM][ERROR] Assertion failed: Can't free tmp workspace for GEMM tactics profiling. (/home/jenkins/agent/workspace/LLM/main/L0_PostMerge/tensorrt_llm/cpp/tensorrt_llm/plugins/common/gemmPluginProfiler.cpp:190)
1       0x7f4273f4224f /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libnvinfer_plugin_tensorrt_llm.so(+0x7724f) [0x7f4273f4224f]
2       0x7f4273ff8cb6 tensorrt_llm::plugins::GemmPluginProfiler<tensorrt_llm::cutlass_extensions::CutlassGemmConfig, std::shared_ptr<tensorrt_llm::kernels::cutlass_kernels::CutlassFpAIntBGemmRunnerInterface>, tensorrt_llm::plugins::GemmIdCore, tensorrt_llm::plugins::GemmIdCoreHash>::freeTmpData() + 70
3       0x7f42740036d3 tensorrt_llm::plugins::GemmPluginProfiler<tensorrt_llm::cutlass_extensions::CutlassGemmConfig, std::shared_ptr<tensorrt_llm::kernels::cutlass_kernels::CutlassFpAIntBGemmRunnerInterface>, tensorrt_llm::plugins::GemmIdCore, tensorrt_llm::plugins::GemmIdCoreHash>::profileTactics(std::shared_ptr<tensorrt_llm::kernels::cutlass_kernels::CutlassFpAIntBGemmRunnerInterface> const&, nvinfer1::DataType const&, tensorrt_llm::plugins::GemmDims const&, tensorrt_llm::plugins::GemmIdCore const&) + 1363
4       0x7f4273fd8949 tensorrt_llm::plugins::WeightOnlyQuantMatmulPlugin::initialize() + 9
5       0x7f443f591a25 /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.10(+0x1065a25) [0x7f443f591a25]
6       0x7f443f51e0aa /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.10(+0xff20aa) [0x7f443f51e0aa]
7       0x7f443f30afcf /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.10(+0xddefcf) [0x7f443f30afcf]
8       0x7f443f30d07c /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.10(+0xde107c) [0x7f443f30d07c]
9       0x7f443f30f071 /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.10(+0xde3071) [0x7f443f30f071]
10      0x7f443ef5461c /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.10(+0xa2861c) [0x7f443ef5461c]
11      0x7f443ef59837 /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.10(+0xa2d837) [0x7f443ef59837]
12      0x7f443ef5a1af /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.10(+0xa2e1af) [0x7f443ef5a1af]
13      0x7f43ea0a6478 /usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so(+0xa6478) [0x7f43ea0a6478]
14      0x7f43ea0457a3 /usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so(+0x457a3) [0x7f43ea0457a3]
15      0x55919ee8f10e /usr/bin/python3(+0x15a10e) [0x55919ee8f10e]
16      0x55919ee85a7b _PyObject_MakeTpCall + 603
17      0x55919ee9dacb /usr/bin/python3(+0x168acb) [0x55919ee9dacb]
18      0x55919ee7dcfa _PyEval_EvalFrameDefault + 24906
19      0x55919ee8f9fc _PyFunction_Vectorcall + 124
20      0x55919ee7a5d7 _PyEval_EvalFrameDefault + 10791
21      0x55919ee8f9fc _PyFunction_Vectorcall + 124
22      0x55919ee7845c _PyEval_EvalFrameDefault + 2220
23      0x55919ee8f9fc _PyFunction_Vectorcall + 124
24      0x55919ee7826d _PyEval_EvalFrameDefault + 1725
25      0x55919ee8f9fc _PyFunction_Vectorcall + 124
26      0x55919ee9e492 PyObject_Call + 290
27      0x55919ee7a5d7 _PyEval_EvalFrameDefault + 10791
28      0x55919ee8f9fc _PyFunction_Vectorcall + 124
29      0x55919ee9e492 PyObject_Call + 290
30      0x55919ee7a5d7 _PyEval_EvalFrameDefault + 10791
31      0x55919ee8f9fc _PyFunction_Vectorcall + 124
32      0x55919ee9e492 PyObject_Call + 290
33      0x55919ee7a5d7 _PyEval_EvalFrameDefault + 10791
34      0x55919ee8f9fc _PyFunction_Vectorcall + 124
35      0x55919ee7826d _PyEval_EvalFrameDefault + 1725
36      0x55919ee749c6 /usr/bin/python3(+0x13f9c6) [0x55919ee749c6]
37      0x55919ef6a256 PyEval_EvalCode + 134
38      0x55919ef95108 /usr/bin/python3(+0x260108) [0x55919ef95108]
39      0x55919ef8e9cb /usr/bin/python3(+0x2599cb) [0x55919ef8e9cb]
40      0x55919ef94e55 /usr/bin/python3(+0x25fe55) [0x55919ef94e55]
41      0x55919ef94338 _PyRun_SimpleFileObject + 424
42      0x55919ef93f83 _PyRun_AnyFileObject + 67
43      0x55919ef86a5e Py_RunMain + 702
44      0x55919ef5d02d Py_BytesMain + 45
45      0x7f4467e70d90 /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f4467e70d90]
46      0x7f4467e70e40 __libc_start_main + 128
47      0x55919ef5cf25 _start + 37
[178bb4bafa40:01791] *** Process received signal ***
[178bb4bafa40:01791] Signal: Aborted (6)
[178bb4bafa40:01791] Signal code:  (-6)
[178bb4bafa40:01791] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f4467e89520]
[178bb4bafa40:01791] [ 1] /lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c)[0x7f4467edd9fc]
[178bb4bafa40:01791] [ 2] /lib/x86_64-linux-gnu/libc.so.6(raise+0x16)[0x7f4467e89476]
[178bb4bafa40:01791] [ 3] /lib/x86_64-linux-gnu/libc.so.6(abort+0xd3)[0x7f4467e6f7f3]
[178bb4bafa40:01791] [ 4] /lib/x86_64-linux-gnu/libstdc++.so.6(+0xa2b9e)[0x7f4450876b9e]
[178bb4bafa40:01791] [ 5] /lib/x86_64-linux-gnu/libstdc++.so.6(+0xae20c)[0x7f445088220c]
[178bb4bafa40:01791] [ 6] /lib/x86_64-linux-gnu/libstdc++.so.6(+0xad1e9)[0x7f44508811e9]
[178bb4bafa40:01791] [ 7] /lib/x86_64-linux-gnu/libstdc++.so.6(__gxx_personality_v0+0x99)[0x7f4450881959]
[178bb4bafa40:01791] [ 8] /lib/x86_64-linux-gnu/libgcc_s.so.1(+0x16884)[0x7f4467b79884]
[178bb4bafa40:01791] [ 9] /lib/x86_64-linux-gnu/libgcc_s.so.1(_Unwind_Resume+0x12d)[0x7f4467b7a2dd]
[178bb4bafa40:01791] [10] /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libnvinfer_plugin_tensorrt_llm.so(_ZN12tensorrt_llm7plugins18GemmPluginProfilerINS_18cutlass_extensions17CutlassGemmConfigESt10shared_ptrINS_7kernels15cutlass_kernels33CutlassFpAIntBGemmRunnerInterfaceEENS0_10GemmIdCoreENS0_14GemmIdCoreHashEE14profileTacticsERKS8_RKN8nvinfer18DataTypeERKNS0_8GemmDimsERKS9_+0xd04)[0x7f4274003e84]
[178bb4bafa40:01791] [11] /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libnvinfer_plugin_tensorrt_llm.so(_ZN12tensorrt_llm7plugins27WeightOnlyQuantMatmulPlugin10initializeEv+0x9)[0x7f4273fd8949]
[178bb4bafa40:01791] [12] /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.10(+0x1065a25)[0x7f443f591a25]
[178bb4bafa40:01791] [13] /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.10(+0xff20aa)[0x7f443f51e0aa]
[178bb4bafa40:01791] [14] /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.10(+0xddefcf)[0x7f443f30afcf]
[178bb4bafa40:01791] [15] /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.10(+0xde107c)[0x7f443f30d07c]
[178bb4bafa40:01791] [16] /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.10(+0xde3071)[0x7f443f30f071]
[178bb4bafa40:01791] [17] /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.10(+0xa2861c)[0x7f443ef5461c]
[178bb4bafa40:01791] [18] /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.10(+0xa2d837)[0x7f443ef59837]
[178bb4bafa40:01791] [19] /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.10(+0xa2e1af)[0x7f443ef5a1af]
[178bb4bafa40:01791] [20] /usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so(+0xa6478)[0x7f43ea0a6478]
[178bb4bafa40:01791] [21] /usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so(+0x457a3)[0x7f43ea0457a3]
[178bb4bafa40:01791] [22] /usr/bin/python3(+0x15a10e)[0x55919ee8f10e]
[178bb4bafa40:01791] [23] /usr/bin/python3(_PyObject_MakeTpCall+0x25b)[0x55919ee85a7b]
[178bb4bafa40:01791] [24] /usr/bin/python3(+0x168acb)[0x55919ee9dacb]
[178bb4bafa40:01791] [25] /usr/bin/python3(_PyEval_EvalFrameDefault+0x614a)[0x55919ee7dcfa]
[178bb4bafa40:01791] [26] /usr/bin/python3(_PyFunction_Vectorcall+0x7c)[0x55919ee8f9fc]
[178bb4bafa40:01791] [27] /usr/bin/python3(_PyEval_EvalFrameDefault+0x2a27)[0x55919ee7a5d7]
[178bb4bafa40:01791] [28] /usr/bin/python3(_PyFunction_Vectorcall+0x7c)[0x55919ee8f9fc]
[178bb4bafa40:01791] [29] /usr/bin/python3(_PyEval_EvalFrameDefault+0x8ac)[0x55919ee7845c]
[178bb4bafa40:01791] *** End of error message ***
Aborted (core dumped)
nv-guomingz commented 3 months ago

@nv-guomingz that issue is now resolved. I had to switch to TensorRT-LLM version: 0.11.0.dev2024052800 and TensorRT-LLM commit f430a4b447ef4cba22698902d43eae0debf08594 .

Now I get another error though at the second step when I run

trtllm-build \
    --checkpoint_dir models/trt_${MODEL_NAME}/fp16/1-gpu \
    --output_dir trt_engines/${MODEL_NAME}/int4_weightonly/1-gpu \
    --gpt_attention_plugin float16 \
    --gemm_plugin float16 \
    --max_batch_size 1 \
    --max_input_len 924 \
    --max_output_len 100 \
    --max_multimodal_len 576

Error log

  what():  [TensorRT-LLM][ERROR] Assertion failed: Can't free tmp workspace for GEMM tactics profiling. (/home/jenkins/agent/workspace/LLM/main/L0_PostMerge/tensorrt_llm/cpp/tensorrt_llm/plugins/common/gemmPluginProfiler.cpp:190)
1       0x7f4273f4224f /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libnvinfer_plugin_tensorrt_llm.so(+0x7724f) [0x7f4273f4224f]
2       0x7f4273ff8cb6 tensorrt_llm::plugins::GemmPluginProfiler<tensorrt_llm::cutlass_extensions::CutlassGemmConfig, std::shared_ptr<tensorrt_llm::kernels::cutlass_kernels::CutlassFpAIntBGemmRunnerInterface>, tensorrt_llm::plugins::GemmIdCore, tensorrt_llm::plugins::GemmIdCoreHash>::freeTmpData() + 70
3       0x7f42740036d3 tensorrt_llm::plugins::GemmPluginProfiler<tensorrt_llm::cutlass_extensions::CutlassGemmConfig, std::shared_ptr<tensorrt_llm::kernels::cutlass_kernels::CutlassFpAIntBGemmRunnerInterface>, tensorrt_llm::plugins::GemmIdCore, tensorrt_llm::plugins::GemmIdCoreHash>::profileTactics(std::shared_ptr<tensorrt_llm::kernels::cutlass_kernels::CutlassFpAIntBGemmRunnerInterface> const&, nvinfer1::DataType const&, tensorrt_llm::plugins::GemmDims const&, tensorrt_llm::plugins::GemmIdCore const&) + 1363
4       0x7f4273fd8949 tensorrt_llm::plugins::WeightOnlyQuantMatmulPlugin::initialize() + 9
5       0x7f443f591a25 /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.10(+0x1065a25) [0x7f443f591a25]
6       0x7f443f51e0aa /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.10(+0xff20aa) [0x7f443f51e0aa]
7       0x7f443f30afcf /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.10(+0xddefcf) [0x7f443f30afcf]
8       0x7f443f30d07c /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.10(+0xde107c) [0x7f443f30d07c]
9       0x7f443f30f071 /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.10(+0xde3071) [0x7f443f30f071]
10      0x7f443ef5461c /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.10(+0xa2861c) [0x7f443ef5461c]
11      0x7f443ef59837 /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.10(+0xa2d837) [0x7f443ef59837]
12      0x7f443ef5a1af /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.10(+0xa2e1af) [0x7f443ef5a1af]
13      0x7f43ea0a6478 /usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so(+0xa6478) [0x7f43ea0a6478]
14      0x7f43ea0457a3 /usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so(+0x457a3) [0x7f43ea0457a3]
15      0x55919ee8f10e /usr/bin/python3(+0x15a10e) [0x55919ee8f10e]
16      0x55919ee85a7b _PyObject_MakeTpCall + 603
17      0x55919ee9dacb /usr/bin/python3(+0x168acb) [0x55919ee9dacb]
18      0x55919ee7dcfa _PyEval_EvalFrameDefault + 24906
19      0x55919ee8f9fc _PyFunction_Vectorcall + 124
20      0x55919ee7a5d7 _PyEval_EvalFrameDefault + 10791
21      0x55919ee8f9fc _PyFunction_Vectorcall + 124
22      0x55919ee7845c _PyEval_EvalFrameDefault + 2220
23      0x55919ee8f9fc _PyFunction_Vectorcall + 124
24      0x55919ee7826d _PyEval_EvalFrameDefault + 1725
25      0x55919ee8f9fc _PyFunction_Vectorcall + 124
26      0x55919ee9e492 PyObject_Call + 290
27      0x55919ee7a5d7 _PyEval_EvalFrameDefault + 10791
28      0x55919ee8f9fc _PyFunction_Vectorcall + 124
29      0x55919ee9e492 PyObject_Call + 290
30      0x55919ee7a5d7 _PyEval_EvalFrameDefault + 10791
31      0x55919ee8f9fc _PyFunction_Vectorcall + 124
32      0x55919ee9e492 PyObject_Call + 290
33      0x55919ee7a5d7 _PyEval_EvalFrameDefault + 10791
34      0x55919ee8f9fc _PyFunction_Vectorcall + 124
35      0x55919ee7826d _PyEval_EvalFrameDefault + 1725
36      0x55919ee749c6 /usr/bin/python3(+0x13f9c6) [0x55919ee749c6]
37      0x55919ef6a256 PyEval_EvalCode + 134
38      0x55919ef95108 /usr/bin/python3(+0x260108) [0x55919ef95108]
39      0x55919ef8e9cb /usr/bin/python3(+0x2599cb) [0x55919ef8e9cb]
40      0x55919ef94e55 /usr/bin/python3(+0x25fe55) [0x55919ef94e55]
41      0x55919ef94338 _PyRun_SimpleFileObject + 424
42      0x55919ef93f83 _PyRun_AnyFileObject + 67
43      0x55919ef86a5e Py_RunMain + 702
44      0x55919ef5d02d Py_BytesMain + 45
45      0x7f4467e70d90 /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f4467e70d90]
46      0x7f4467e70e40 __libc_start_main + 128
47      0x55919ef5cf25 _start + 37
[178bb4bafa40:01791] *** Process received signal ***
[178bb4bafa40:01791] Signal: Aborted (6)
[178bb4bafa40:01791] Signal code:  (-6)
[178bb4bafa40:01791] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f4467e89520]
[178bb4bafa40:01791] [ 1] /lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c)[0x7f4467edd9fc]
[178bb4bafa40:01791] [ 2] /lib/x86_64-linux-gnu/libc.so.6(raise+0x16)[0x7f4467e89476]
[178bb4bafa40:01791] [ 3] /lib/x86_64-linux-gnu/libc.so.6(abort+0xd3)[0x7f4467e6f7f3]
[178bb4bafa40:01791] [ 4] /lib/x86_64-linux-gnu/libstdc++.so.6(+0xa2b9e)[0x7f4450876b9e]
[178bb4bafa40:01791] [ 5] /lib/x86_64-linux-gnu/libstdc++.so.6(+0xae20c)[0x7f445088220c]
[178bb4bafa40:01791] [ 6] /lib/x86_64-linux-gnu/libstdc++.so.6(+0xad1e9)[0x7f44508811e9]
[178bb4bafa40:01791] [ 7] /lib/x86_64-linux-gnu/libstdc++.so.6(__gxx_personality_v0+0x99)[0x7f4450881959]
[178bb4bafa40:01791] [ 8] /lib/x86_64-linux-gnu/libgcc_s.so.1(+0x16884)[0x7f4467b79884]
[178bb4bafa40:01791] [ 9] /lib/x86_64-linux-gnu/libgcc_s.so.1(_Unwind_Resume+0x12d)[0x7f4467b7a2dd]
[178bb4bafa40:01791] [10] /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libnvinfer_plugin_tensorrt_llm.so(_ZN12tensorrt_llm7plugins18GemmPluginProfilerINS_18cutlass_extensions17CutlassGemmConfigESt10shared_ptrINS_7kernels15cutlass_kernels33CutlassFpAIntBGemmRunnerInterfaceEENS0_10GemmIdCoreENS0_14GemmIdCoreHashEE14profileTacticsERKS8_RKN8nvinfer18DataTypeERKNS0_8GemmDimsERKS9_+0xd04)[0x7f4274003e84]
[178bb4bafa40:01791] [11] /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libnvinfer_plugin_tensorrt_llm.so(_ZN12tensorrt_llm7plugins27WeightOnlyQuantMatmulPlugin10initializeEv+0x9)[0x7f4273fd8949]
[178bb4bafa40:01791] [12] /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.10(+0x1065a25)[0x7f443f591a25]
[178bb4bafa40:01791] [13] /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.10(+0xff20aa)[0x7f443f51e0aa]
[178bb4bafa40:01791] [14] /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.10(+0xddefcf)[0x7f443f30afcf]
[178bb4bafa40:01791] [15] /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.10(+0xde107c)[0x7f443f30d07c]
[178bb4bafa40:01791] [16] /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.10(+0xde3071)[0x7f443f30f071]
[178bb4bafa40:01791] [17] /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.10(+0xa2861c)[0x7f443ef5461c]
[178bb4bafa40:01791] [18] /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.10(+0xa2d837)[0x7f443ef59837]
[178bb4bafa40:01791] [19] /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.10(+0xa2e1af)[0x7f443ef5a1af]
[178bb4bafa40:01791] [20] /usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so(+0xa6478)[0x7f43ea0a6478]
[178bb4bafa40:01791] [21] /usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so(+0x457a3)[0x7f43ea0457a3]
[178bb4bafa40:01791] [22] /usr/bin/python3(+0x15a10e)[0x55919ee8f10e]
[178bb4bafa40:01791] [23] /usr/bin/python3(_PyObject_MakeTpCall+0x25b)[0x55919ee85a7b]
[178bb4bafa40:01791] [24] /usr/bin/python3(+0x168acb)[0x55919ee9dacb]
[178bb4bafa40:01791] [25] /usr/bin/python3(_PyEval_EvalFrameDefault+0x614a)[0x55919ee7dcfa]
[178bb4bafa40:01791] [26] /usr/bin/python3(_PyFunction_Vectorcall+0x7c)[0x55919ee8f9fc]
[178bb4bafa40:01791] [27] /usr/bin/python3(_PyEval_EvalFrameDefault+0x2a27)[0x55919ee7a5d7]
[178bb4bafa40:01791] [28] /usr/bin/python3(_PyFunction_Vectorcall+0x7c)[0x55919ee8f9fc]
[178bb4bafa40:01791] [29] /usr/bin/python3(_PyEval_EvalFrameDefault+0x8ac)[0x55919ee7845c]
[178bb4bafa40:01791] *** End of error message ***
Aborted (core dumped)

Did u rebuild the tensorrt-llm with the latest code base?

Tchaikovic commented 3 months ago

No, I pip installed it.

nv-guomingz commented 3 months ago

ok, let me try to reproduce it on my side firstly.

nv-guomingz commented 3 months ago

Confirmed that T4 doesn't support weights only mode at this moment. Please try volta+ arch .