Open pankajroark opened 3 months ago
Could you try pip install tensorrt_llm== 0.11.0.dev2024061100
And also provide the invoking scripts.
Thanks
Tried with 0.11.0.dev2024061100
and the issue still persists.
Invoking script (from examples/llama in TensorRT-LLM git repo): Build engine
BASE_LLAMA_MODEL=mistral
huggingface-cli download mistralai/Mistral-7B-Instruct-v0.2 --local-dir $BASE_LLAMA_MODEL
python3 convert_checkpoint.py --model_dir ${BASE_LLAMA_MODEL} \
--output_dir ./tllm_checkpoint_1gpu \
--dtype float16
trtllm-build --checkpoint_dir ./tllm_checkpoint_1gpu \
--output_dir ./engine/mistral/i1600-o600-bs96-tp1-fp16-lora \
--gemm_plugin float16 \
--max_batch_size 96 \
--max_input_len 1600 \
--max_output_len 600 \
--gpt_attention_plugin float16 \
--paged_kv_cache enable \
--remove_input_padding enable \
--use_paged_context_fmha enable \
--use_custom_all_reduce disable \
--lora_plugin float16 \
--lora_target_modules attn_q attn_k attn_v attn_dense \
--max_lora_rank 16
Build LoRA and invoke:
huggingface-cli download Tsukitsune/alpaca_7b_lora --local-dir lora
python3 ../hf_lora_convert.py -i lora -o Tsukitsune-alpaca_7b_lora-weights --storage-type float16
python3 ../run.py --max_output_len=50 \
--tokenizer_dir ./mistral/ \
--engine_dir=./engine/mistral/i1600-o600-bs96-tp1-fp16-lora \
--use_py_session \
--lora_task_uids=0 \
--lora_dir=lora
The error is:
terminate called after throwing an instance of 'tensorrt_llm::common::TllmException'
what(): [TensorRT-LLM][ERROR] CUDA runtime error in cublasLtMatmul(getCublasLtHandle(), mOperationDesc, alpha, A, mADesc, B, mBDesc, beta, C, mCDesc, C, mCDesc, (hasAlgo ? (&algo) : NULL), mCublasWorkspace, workspaceSize, mStream): CUBLAS_STATUS_NOT_SUPPORTED (/home/jenkins/agent/workspace/LLM/main/L0_PostMerge/tensorrt_llm/cpp/tensorrt_llm/common/cublasMMWrapper.cpp:157)
1 0x7f6ba0718329 void tensorrt_llm::common::check<cublasStatus_t>(cublasStatus_t, char const*, char const*, int) + 121
2 0x7f6ba0716a79 tensorrt_llm::common::CublasMMWrapper::Gemm(cublasOperation_t, cublasOperation_t, int, int, int, void const*, int, void const*, int, void*, int, float, float, cublasLtMatmulAlgo_t const&, bool, bool) + 281
3 0x7f6ba0717004 tensorrt_llm::common::CublasMMWrapper::Gemm(cublasOperation_t, cublasOperation_t, int, int, int, void const*, int, void const*, int, void*, int, float, float, std::optional<cublasLtMatmulHeuristicResult_t> const&) + 84
4 0x7f6b554bca1f /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libnvinfer_plugin_tensorrt_llm.so(+0x11da1f) [0x7f6b554bca1f]
5 0x7f6b554bd632 tensorrt_llm::plugins::GemmPlugin::enqueue(nvinfer1::PluginTensorDesc const*, nvinfer1::PluginTensorDesc const*, void const* const*, void* const*, void*, CUstream_st*) + 2450
6 0x7f6c729c7a8c /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.10(+0x109fa8c) [0x7f6c729c7a8c]
7 0x7f6c7296c657 /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.10(+0x1044657) [0x7f6c7296c657]
8 0x7f6c7296e0c1 /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.10(+0x10460c1) [0x7f6c7296e0c1]
9 0x7f6c1d6a48f0 /usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so(+0xa48f0) [0x7f6c1d6a48f0]
10 0x7f6c1d6458f3 /usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so(+0x458f3) [0x7f6c1d6458f3]
11 0x55f7e2cac10e python3(+0x15a10e) [0x55f7e2cac10e]
12 0x55f7e2ca2a7b _PyObject_MakeTpCall + 603
13 0x55f7e2cbaacb python3(+0x168acb) [0x55f7e2cbaacb]
14 0x55f7e2c9acfa _PyEval_EvalFrameDefault + 24906
15 0x55f7e2cac9fc _PyFunction_Vectorcall + 124
16 0x55f7e2c9545c _PyEval_EvalFrameDefault + 2220
17 0x55f7e2cba93e python3(+0x16893e) [0x55f7e2cba93e]
18 0x55f7e2c975d7 _PyEval_EvalFrameDefault + 10791
19 0x55f7e2cba93e python3(+0x16893e) [0x55f7e2cba93e]
20 0x55f7e2c975d7 _PyEval_EvalFrameDefault + 10791
21 0x55f7e2cac9fc _PyFunction_Vectorcall + 124
22 0x55f7e2cbb492 PyObject_Call + 290
23 0x55f7e2c975d7 _PyEval_EvalFrameDefault + 10791
24 0x55f7e2cba7f1 python3(+0x1687f1) [0x55f7e2cba7f1]
25 0x55f7e2cbb492 PyObject_Call + 290
26 0x55f7e2c975d7 _PyEval_EvalFrameDefault + 10791
27 0x55f7e2cba7f1 python3(+0x1687f1) [0x55f7e2cba7f1]
28 0x55f7e2cbb492 PyObject_Call + 290
29 0x55f7e2c975d7 _PyEval_EvalFrameDefault + 10791
30 0x55f7e2cac9fc _PyFunction_Vectorcall + 124
31 0x55f7e2c9526d _PyEval_EvalFrameDefault + 1725
32 0x55f7e2c919c6 python3(+0x13f9c6) [0x55f7e2c919c6]
33 0x55f7e2d87256 PyEval_EvalCode + 134
34 0x55f7e2db2108 python3(+0x260108) [0x55f7e2db2108]
35 0x55f7e2dab9cb python3(+0x2599cb) [0x55f7e2dab9cb]
36 0x55f7e2db1e55 python3(+0x25fe55) [0x55f7e2db1e55]
37 0x55f7e2db1338 _PyRun_SimpleFileObject + 424
38 0x55f7e2db0f83 _PyRun_AnyFileObject + 67
39 0x55f7e2da3a5e Py_RunMain + 702
40 0x55f7e2d7a02d Py_BytesMain + 45
41 0x7f6dd77c8d90 /lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f6dd77c8d90]
42 0x7f6dd77c8e40 __libc_start_main + 128
43 0x55f7e2d79f25 _start + 37
[a86872ebeb62:11430] *** Process received signal ***
[a86872ebeb62:11430] Signal: Aborted (6)
[a86872ebeb62:11430] Signal code: (-6)
[a86872ebeb62:11430] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f6dd77e1520]
[a86872ebeb62:11430] [ 1] /lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c)[0x7f6dd78359fc]
[a86872ebeb62:11430] [ 2] /lib/x86_64-linux-gnu/libc.so.6(raise+0x16)[0x7f6dd77e1476]
[a86872ebeb62:11430] [ 3] /lib/x86_64-linux-gnu/libc.so.6(abort+0xd3)[0x7f6dd77c77f3]
[a86872ebeb62:11430] [ 4] /lib/x86_64-linux-gnu/libstdc++.so.6(+0xa2b9e)[0x7f6d35076b9e]
[a86872ebeb62:11430] [ 5] /lib/x86_64-linux-gnu/libstdc++.so.6(+0xae20c)[0x7f6d3508220c]
[a86872ebeb62:11430] [ 6] /lib/x86_64-linux-gnu/libstdc++.so.6(+0xad1e9)[0x7f6d350811e9]
[a86872ebeb62:11430] [ 7] /lib/x86_64-linux-gnu/libstdc++.so.6(__gxx_personality_v0+0x99)[0x7f6d35081959]
[a86872ebeb62:11430] [ 8] /lib/x86_64-linux-gnu/libgcc_s.so.1(+0x16884)[0x7f6dd74d1884]
[a86872ebeb62:11430] [ 9] /lib/x86_64-linux-gnu/libgcc_s.so.1(_Unwind_Resume+0x12d)[0x7f6dd74d22dd]
[a86872ebeb62:11430] [10] /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(+0x7205dd)[0x7f6ba05f85dd]
[a86872ebeb62:11430] [11] /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libtensorrt_llm.so(_ZN12tensorrt_llm6common15CublasMMWrapper4GemmE17cublasOperation_tS2_iiiPKviS4_iPviffRKSt8optionalI31cublasLtMatmulHeuristicResult_tE+0x54)[0x7f6ba0717004]
[a86872ebeb62:11430] [12] /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libnvinfer_plugin_tensorrt_llm.so(+0x11da1f)[0x7f6b554bca1f]
[a86872ebeb62:11430] [13] /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libnvinfer_plugin_tensorrt_llm.so(_ZN12tensorrt_llm7plugins10GemmPlugin7enqueueEPKN8nvinfer116PluginTensorDescES5_PKPKvPKPvSA_P11CUstream_st+0x992)[0x7f6b554bd632]
[a86872ebeb62:11430] [14] /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.10(+0x109fa8c)[0x7f6c729c7a8c]
[a86872ebeb62:11430] [15] /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.10(+0x1044657)[0x7f6c7296c657]
[a86872ebeb62:11430] [16] /usr/local/lib/python3.10/dist-packages/tensorrt_libs/libnvinfer.so.10(+0x10460c1)[0x7f6c7296e0c1]
[a86872ebeb62:11430] [17] /usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so(+0xa48f0)[0x7f6c1d6a48f0]
[a86872ebeb62:11430] [18] /usr/local/lib/python3.10/dist-packages/tensorrt_bindings/tensorrt.so(+0x458f3)[0x7f6c1d6458f3]
[a86872ebeb62:11430] [19] python3(+0x15a10e)[0x55f7e2cac10e]
[a86872ebeb62:11430] [20] python3(_PyObject_MakeTpCall+0x25b)[0x55f7e2ca2a7b]
[a86872ebeb62:11430] [21] python3(+0x168acb)[0x55f7e2cbaacb]
[a86872ebeb62:11430] [22] python3(_PyEval_EvalFrameDefault+0x614a)[0x55f7e2c9acfa]
[a86872ebeb62:11430] [23] python3(_PyFunction_Vectorcall+0x7c)[0x55f7e2cac9fc]
[a86872ebeb62:11430] [24] python3(_PyEval_EvalFrameDefault+0x8ac)[0x55f7e2c9545c]
[a86872ebeb62:11430] [25] python3(+0x16893e)[0x55f7e2cba93e]
[a86872ebeb62:11430] [26] python3(_PyEval_EvalFrameDefault+0x2a27)[0x55f7e2c975d7]
[a86872ebeb62:11430] [27] python3(+0x16893e)[0x55f7e2cba93e]
[a86872ebeb62:11430] [28] python3(_PyEval_EvalFrameDefault+0x2a27)[0x55f7e2c975d7]
[a86872ebeb62:11430] [29] python3(_PyFunction_Vectorcall+0x7c)[0x55f7e2cac9fc]
[a86872ebeb62:11430] *** End of error message ***
cc @hijkzzz
Please note that I've provided the requested information. The issue is still labeled as waiting for feedback.
We are working on solving the issue
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."
System Info
x86_64, NVIDIA A100 80GB, TensorRT-LLM v0.10.0
Who can help?
@ncomly-nvidia
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
python convert_checkpoint.py --model_dir ${BASE_LLAMA_MODEL} \ --output_dir ./tllm_checkpoint_1gpu \ --dtype float16 trtllm-build --checkpoint_dir ./tllm_checkpoint_1gpu \ --output_dir ./engine/mistral/i1600-o600-bs96-tp1-fp16-lora \ --gemm_plugin float16 \ --max_batch_size 96 \ --max_input_len 1600 \ --max_output_len 600 \ --gpt_attention_plugin float16 \ --paged_kv_cache enable \ --remove_input_padding enable \ --use_paged_context_fmha enable \ --use_custom_all_reduce disable \ --lora_plugin float16 \ --lora_target_modules attn_q attn_k attn_v attn_dense \ --max_lora_rank 16