Open naphatkps opened 3 months ago
@nekorobov would u please take a look on it?
@Naphat-Khoprasertthaworn thank you for reporting the problem. This failure happens dues to lack of memory for the gemm plugins profiler. You can try to reduce the number of tokens and batch size. E.g. set --max_num_tokens 1024
and/or --max_batch_size 32
. Let me know if it helps.
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."
System Info
rel
Who can help?
@Tracin @byshiue
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
make -C docker release_build CUDA_ARCHS="86-real"
make -C docker release_run
cd /code/tensorrt_llm
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig from peft import PeftModel import torch
base_model_name_or_path = "SeaLLMs/SeaLLM-7B-v2" adaptor_model_path = "./model_weights/lora_adaptor/ft_model" new_model_path = "./model_weights/seallm_with_lora"
model_for_merge = AutoModelForCausalLM.from_pretrained( base_model_name_or_path, torch_dtype=torch.float16, ) full_model = PeftModel.from_pretrained(model_for_merge, model_id=adaptor_model_path, ) full_model = full_model.base_model.merge_and_unload()
full_model.save_pretrained(new_model_path)
tokenizer = AutoTokenizer.from_pretrained( base_model_name_or_path, padding_side='left', add_eos_token= True, add_bos_token= True, trust_remote_code= True, )
tokenizer.save_pretrained(new_model_path)
python convert_checkpoint.py --model_dir ../../model_weights/seallm_with_lora_pk \ --output_dir ./tmp/seallm/7B/checkpoint_merged_LoRA/int8/1-gpu/ \ --dtype float16 \ --use_weight_only \ --weight_only_precision int8
trtllm-build --checkpoint_dir ./tmp/seallm/7B/checkpoint_merged_LoRA/int8/1-gpu/ \ --output_dir ./tmp/seallm/7B/engine_merged_LoRA/int8/1-gpu/ \ --max_beam_width 3 \ --gemm_plugin float16 \ --context_fmha enable \ --paged_kv_cache enable \ --remove_input_padding enable \ --gpt_attention_plugin float16 \ --multi_block_mode enable \ --use_paged_context_fmha enable
.... ThreadblockSwizzle_ = cutlass::gemm::threadblock::GemmIdentityThreadblockSwizzle<1>; KernelArch = cutlass::arch::Sm89; __nv_bool SplitKSerial = true] not implemented [TensorRT-LLM][WARNING] Have not found any valid GEMM config for shape (m=1, n=6144, k=4096). Will try to use default or fail at runtime, tensorrt_llm::plugins::GemmIdCore, tensorrt_llm::plugins::GemmIdCoreHash>::freeTmpData() + 104, tensorrt_llm::plugins::GemmIdCore, tensorrt_llm::plugins::GemmIdCoreHash>::profileTactics(std::shared_ptr const&, nvinfer1::DataType const&,
tensorrt_llm::plugins::GemmDims const&, tensorrt_llm::plugins::GemmIdCore const&) + 1066
[TensorRT-LLM][WARNING] Have not found any valid GEMM config for shape (m=2, n=6144, k=4096). Will try to use default or fail at runtime
[TensorRT-LLM][WARNING] Have not found any valid GEMM config for shape (m=4, n=6144, k=4096). Will try to use default or fail at runtime
[TensorRT-LLM][WARNING] Have not found any valid GEMM config for shape (m=8, n=6144, k=4096). Will try to use default or fail at runtime
[TensorRT-LLM][WARNING] Have not found any valid GEMM config for shape (m=16, n=6144, k=4096). Will try to use default or fail at runtime
[TensorRT-LLM][WARNING] Have not found any valid GEMM config for shape (m=32, n=6144, k=4096). Will try to use default or fail at runtime
[TensorRT-LLM][WARNING] Have not found any valid GEMM config for shape (m=64, n=6144, k=4096). Will try to use default or fail at runtime
[TensorRT-LLM][WARNING] Have not found any valid GEMM config for shape (m=128, n=6144, k=4096). Will try to use default or fail at runtime
[TensorRT-LLM][WARNING] Have not found any valid GEMM config for shape (m=256, n=6144, k=4096). Will try to use default or fail at runtime
[TensorRT-LLM][WARNING] Have not found any valid GEMM config for shape (m=512, n=6144, k=4096). Will try to use default or fail at runtime
[TensorRT-LLM][WARNING] Have not found any valid GEMM config for shape (m=1024, n=6144, k=4096). Will try to use default or fail at runtime
terminate called after throwing an instance of 'tensorrt_llm::common::TllmException'
what(): [TensorRT-LLM][ERROR] Assertion failed: Can't free tmp workspace for GEMM tactics profiling. (/src/tensorrt_llm/cpp/tensorrt_llm/plugins/common/gemmPluginProfiler.cpp:181)
1 0x7f74ca703242 /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libnvinfer_plugin_tensorrt_llm.so(+0x57242) [0x7f74ca703242]
2 0x7f74ca7c1198 tensorrt_llm::plugins::GemmPluginProfiler<tensorrt_llm::cutlass_extensions::CutlassGemmConfig, std::shared_ptr
3 0x7f74ca7ca9ca tensorrt_llm::plugins::GemmPluginProfiler<tensorrt_llm::cutlass_extensions::CutlassGemmConfig, std::shared_ptr
4 0x7f74ca7a0cbd tensorrt_llm::plugins::WeightOnlyQuantMatmulPlugin::initialize() + 13
5 0x7f7608599a25 /usr/local/tensorrt/lib/libnvinfer.so.10(+0x1065a25) [0x7f7608599a25]
6 0x7f76085260aa /usr/local/tensorrt/lib/libnvinfer.so.10(+0xff20aa) [0x7f76085260aa]
7 0x7f7608312fcf /usr/local/tensorrt/lib/libnvinfer.so.10(+0xddefcf) [0x7f7608312fcf]
8 0x7f760831507c /usr/local/tensorrt/lib/libnvinfer.so.10(+0xde107c) [0x7f760831507c]
9 0x7f7608317071 /usr/local/tensorrt/lib/libnvinfer.so.10(+0xde3071) [0x7f7608317071]
10 0x7f7607f5c61c /usr/local/tensorrt/lib/libnvinfer.so.10(+0xa2861c) [0x7f7607f5c61c]
11 0x7f7607f61837 /usr/local/tensorrt/lib/libnvinfer.so.10(+0xa2d837) [0x7f7607f61837]
12 0x7f7607f621af /usr/local/tensorrt/lib/libnvinfer.so.10(+0xa2e1af) [0x7f7607f621af]
13 0x7f7615ea6478 /usr/local/lib/python3.10/dist-packages/tensorrt/tensorrt.so(+0xa6478) [0x7f7615ea6478]
14 0x7f7615e457a3 /usr/local/lib/python3.10/dist-packages/tensorrt/tensorrt.so(+0x457a3) [0x7f7615e457a3]
15 0x55afc1b9810e /usr/bin/python(+0x15a10e) [0x55afc1b9810e]
16 0x55afc1b8ea7b _PyObject_MakeTpCall + 603
17 0x55afc1ba6acb /usr/bin/python(+0x168acb) [0x55afc1ba6acb]
18 0x55afc1b86cfa _PyEval_EvalFrameDefault + 24906
19 0x55afc1b989fc _PyFunction_Vectorcall + 124
20 0x55afc1b835d7 _PyEval_EvalFrameDefault + 10791
21 0x55afc1b989fc _PyFunction_Vectorcall + 124
22 0x55afc1b8145c _PyEval_EvalFrameDefault + 2220
23 0x55afc1b989fc _PyFunction_Vectorcall + 124
24 0x55afc1b8126d _PyEval_EvalFrameDefault + 1725
25 0x55afc1b989fc _PyFunction_Vectorcall + 124
26 0x55afc1ba7492 PyObject_Call + 290
27 0x55afc1b835d7 _PyEval_EvalFrameDefault + 10791
28 0x55afc1b989fc _PyFunction_Vectorcall + 124
29 0x55afc1ba7492 PyObject_Call + 290
30 0x55afc1b835d7 _PyEval_EvalFrameDefault + 10791
31 0x55afc1b989fc _PyFunction_Vectorcall + 124
32 0x55afc1ba7492 PyObject_Call + 290
33 0x55afc1b835d7 _PyEval_EvalFrameDefault + 10791
34 0x55afc1b989fc _PyFunction_Vectorcall + 124
35 0x55afc1b8126d _PyEval_EvalFrameDefault + 1725
36 0x55afc1b7d9c6 /usr/bin/python(+0x13f9c6) [0x55afc1b7d9c6]
37 0x55afc1c73256 PyEval_EvalCode + 134
38 0x55afc1c9e108 /usr/bin/python(+0x260108) [0x55afc1c9e108]
39 0x55afc1c979cb /usr/bin/python(+0x2599cb) [0x55afc1c979cb]
40 0x55afc1c9de55 /usr/bin/python(+0x25fe55) [0x55afc1c9de55]
41 0x55afc1c9d338 _PyRun_SimpleFileObject + 424
42 0x55afc1c9cf83 _PyRun_AnyFileObject + 67
43 0x55afc1c8fa5e Py_RunMain + 702
44 0x55afc1c6602d Py_BytesMain + 45
45 0x7f767409ed90 /usr/lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f767409ed90]
46 0x7f767409ee40 libc_start_main + 128
47 0x55afc1c65f25 _start + 37
[cu-set-release:61857] Process received signal
[cu-set-release:61857] Signal: Aborted (6)
[cu-set-release:61857] Signal code: (-6)
[cu-set-release:61857] [ 0] /usr/lib/x86_64-linux-gnu/libc.so.6(+0x42520)[0x7f76740b7520]
[cu-set-release:61857] [ 1] /usr/lib/x86_64-linux-gnu/libc.so.6(pthread_kill+0x12c)[0x7f767410b9fc]
[cu-set-release:61857] [ 2] /usr/lib/x86_64-linux-gnu/libc.so.6(raise+0x16)[0x7f76740b7476]
[cu-set-release:61857] [ 3] /usr/lib/x86_64-linux-gnu/libc.so.6(abort+0xd3)[0x7f767409d7f3]
[cu-set-release:61857] [ 4] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xa2b9e)[0x7f7604c0db9e]
[cu-set-release:61857] [ 5] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xae20c)[0x7f7604c1920c]
[cu-set-release:61857] [ 6] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xad1e9)[0x7f7604c181e9]
[cu-set-release:61857] [ 7] /usr/lib/x86_64-linux-gnu/libstdc++.so.6(gxx_personality_v0+0x99)[0x7f7604c18959]
[cu-set-release:61857] [ 8] /usr/lib/x86_64-linux-gnu/libgcc_s.so.1(+0x16884)[0x7f7673c88884]
[cu-set-release:61857] [ 9] /usr/lib/x86_64-linux-gnu/libgcc_s.so.1(_Unwind_Resume+0x12d)[0x7f7673c892dd]
[cu-set-release:61857] [10] /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libnvinfer_plugin_tensorrt_llm.so(_ZN12tensorrt_llm7plugins18GemmPluginProfilerINS_18cutlass_extensions17CutlassGemmConfigESt10shared_ptrINS_7kernels15cutlass_kernels33CutlassFpAIntBGemmRunnerInterfaceEENS0_10GemmIdCoreENS0_14GemmIdCoreHashEE14profileTacticsERKS8_RKN8nvinfer18DataTypeERKNS08GemmDimsERKS9+0x75c)[0x7f74ca7cacfc] [cu-set-release:61857] [11] /usr/local/lib/python3.10/dist-packages/tensorrt_llm/libs/libnvinfer_plugin_tensorrt_llm.so(_ZN12tensorrt_llm7plugins27WeightOnlyQuantMatmulPlugin10initializeEv+0xd)[0x7f74ca7a0cbd]
[cu-set-release:61857] [12] /usr/local/tensorrt/lib/libnvinfer.so.10(+0x1065a25)[0x7f7608599a25]
[cu-set-release:61857] [13] /usr/local/tensorrt/lib/libnvinfer.so.10(+0xff20aa)[0x7f76085260aa]
[cu-set-release:61857] [14] /usr/local/tensorrt/lib/libnvinfer.so.10(+0xddefcf)[0x7f7608312fcf]
[cu-set-release:61857] [15] /usr/local/tensorrt/lib/libnvinfer.so.10(+0xde107c)[0x7f760831507c]
[cu-set-release:61857] [16] /usr/local/tensorrt/lib/libnvinfer.so.10(+0xde3071)[0x7f7608317071]
[cu-set-release:61857] [17] /usr/local/tensorrt/lib/libnvinfer.so.10(+0xa2861c)[0x7f7607f5c61c]
[cu-set-release:61857] [18] /usr/local/tensorrt/lib/libnvinfer.so.10(+0xa2d837)[0x7f7607f61837]
[cu-set-release:61857] [19] /usr/local/tensorrt/lib/libnvinfer.so.10(+0xa2e1af)[0x7f7607f621af]
[cu-set-release:61857] [20] /usr/local/lib/python3.10/dist-packages/tensorrt/tensorrt.so(+0xa6478)[0x7f7615ea6478]
[cu-set-release:61857] [21] /usr/local/lib/python3.10/dist-packages/tensorrt/tensorrt.so(+0x457a3)[0x7f7615e457a3]
[cu-set-release:61857] [22] /usr/bin/python(+0x15a10e)[0x55afc1b9810e]
[cu-set-release:61857] [23] /usr/bin/python(_PyObject_MakeTpCall+0x25b)[0x55afc1b8ea7b]
[cu-set-release:61857] [24] /usr/bin/python(+0x168acb)[0x55afc1ba6acb]
[cu-set-release:61857] [25] /usr/bin/python(_PyEval_EvalFrameDefault+0x614a)[0x55afc1b86cfa]
[cu-set-release:61857] [26] /usr/bin/python(_PyFunction_Vectorcall+0x7c)[0x55afc1b989fc]
[cu-set-release:61857] [27] /usr/bin/python(_PyEval_EvalFrameDefault+0x2a27)[0x55afc1b835d7]
[cu-set-release:61857] [28] /usr/bin/python(_PyFunction_Vectorcall+0x7c)[0x55afc1b989fc]
[cu-set-release:61857] [29] /usr/bin/python(_PyEval_EvalFrameDefault+0x8ac)[0x55afc1b8145c]
[cu-set-release:61857] End of error message
Aborted (core dumped)