TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
System Info
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
python convert_checkpoint.py --model_dir <qwen2-0.5b> --output_dir <qwen2-0.5b-trt_sq_2gpu> --dtype float16 --smoothquant 0.5 --tp_size 2 --per_token --per_channel
trtllm-build --checkpoint_dir <qwen2-0.5b-trt_sq_2gpu> --output_dir <qwen2-0.5b-trt_sq_2gpu_engine> --gemm_plugin float16
python run.py --max_output_len 128 --tokenizer_dir <qwen2-0.5b> --engine_dir <qwen2-0.5b-trt_sq_2gpu_engine> --input_text hello
Expected behavior
output generate text
actual behavior
[06/18/2024-19:33:47] [TRT-LLM] [W] Found pynvml==11.4.1 and cuda driver version b'470.82.01'. Please use pynvml>=11.5.0 and cuda driver>=526 to get accurate memory usage. [TensorRT-LLM] TensorRT-LLM version: 0.11.0.dev2024060400 Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Traceback (most recent call last): File "run.py", line 504, in
main(args)
File "run.py", line 341, in main
runner = runner_cls.from_dir(*runner_kwargs)
File "/opt/conda/lib/python3.8/site-packages/tensorrt_llm/runtime/model_runner_cpp.py", line 200, in from_dir
world_config = WorldConfig.mpi(tensor_parallelism=tp_size,
RuntimeError: [TensorRT-LLM][ERROR] Assertion failed: mpiSize == tp pp (/mnt/user/208364/projects/TensorRT-LLM/cpp/tensorrt_llm/runtime/worldConfig.cpp:99)
1 0x7fbaefb0a5c3 tensorrt_llm::common::throwRuntimeError(char const*, int, std::string const&) + 82
2 0x7fbaf172178d tensorrt_llm::runtime::WorldConfig::mpi(int, std::optional, std::optional, std::optional<std::vector<int, std::allocator > > const&) + 285
3 0x7fbc445e3bd6 /opt/conda/lib/python3.8/site-packages/tensorrt_llm/bindings.cpython-38-x86_64-linux-gnu.so(+0x5bbd6) [0x7fbc445e3bd6]
4 0x7fbc445ca6eb /opt/conda/lib/python3.8/site-packages/tensorrt_llm/bindings.cpython-38-x86_64-linux-gnu.so(+0x426eb) [0x7fbc445ca6eb]
5 0x4f5572 PyCFunction_Call + 82
6 0x4e0e1b _PyObject_MakeTpCall + 955
7 0x4dd0c6 _PyEval_EvalFrameDefault + 20406
8 0x4d70d1 _PyEval_EvalCodeWithName + 753
9 0x4e823c _PyFunction_Vectorcall + 412
10 0x4f53ae python() [0x4f53ae]
11 0x4f76ce PyObject_Call + 846
12 0x4da183 _PyEval_EvalFrameDefault + 8307
13 0x4d70d1 _PyEval_EvalCodeWithName + 753
14 0x4e823c _PyFunction_Vectorcall + 412
15 0x4d84a9 _PyEval_EvalFrameDefault + 921
16 0x4d70d1 _PyEval_EvalCodeWithName + 753
17 0x585e29 PyEval_EvalCodeEx + 57
18 0x585deb PyEval_EvalCode + 27
19 0x5a5bd1 python() [0x5a5bd1]
20 0x5a4bdf python() [0x5a4bdf]
21 0x45c538 python() [0x45c538]
22 0x45c0d9 PyRun_SimpleFileExFlags + 832
23 0x44fe8f python() [0x44fe8f]
24 0x579e89 Py_BytesMain + 57
25 0x7fbc6668c192 __libc_start_main + 242
26 0x579d3d python() [0x579d3d]
additional notes
N/A