RuntimeError when use examples/run.py: Assertion failed: mpiSize == tp * pp

System Info

tensorrt 10.0.1 tensorrt-dispatch 10.0.1 tensorrt-lean 10.0.1 tensorrt-llm 0.11.0.dev2024060400

Who can help?

No response

Information

[X] The official example scripts
[ ] My own modified scripts

Tasks

[X] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

python convert_checkpoint.py --model_dir <qwen2-0.5b> --output_dir <qwen2-0.5b-trt_sq_2gpu> --dtype float16 --smoothquant 0.5 --tp_size 2 --per_token --per_channel trtllm-build --checkpoint_dir <qwen2-0.5b-trt_sq_2gpu> --output_dir <qwen2-0.5b-trt_sq_2gpu_engine> --gemm_plugin float16 python run.py --max_output_len 128 --tokenizer_dir <qwen2-0.5b> --engine_dir <qwen2-0.5b-trt_sq_2gpu_engine> --input_text hello

Expected behavior

output generate text

actual behavior

[06/18/2024-19:33:47] [TRT-LLM] [W] Found pynvml==11.4.1 and cuda driver version b'470.82.01'. Please use pynvml>=11.5.0 and cuda driver>=526 to get accurate memory usage. [TensorRT-LLM] TensorRT-LLM version: 0.11.0.dev2024060400 Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Traceback (most recent call last): File "run.py", line 504, in main(args) File "run.py", line 341, in main runner = runner_cls.from_dir(*runner_kwargs) File "/opt/conda/lib/python3.8/site-packages/tensorrt_llm/runtime/model_runner_cpp.py", line 200, in from_dir world_config = WorldConfig.mpi(tensor_parallelism=tp_size, RuntimeError: [TensorRT-LLM][ERROR] Assertion failed: mpiSize == tp pp (/mnt/user/208364/projects/TensorRT-LLM/cpp/tensorrt_llm/runtime/worldConfig.cpp:99) 1 0x7fbaefb0a5c3 tensorrt_llm::common::throwRuntimeError(char const*, int, std::string const&) + 82 2 0x7fbaf172178d tensorrt_llm::runtime::WorldConfig::mpi(int, std::optional, std::optional, std::optional<std::vector<int, std::allocator > > const&) + 285 3 0x7fbc445e3bd6 /opt/conda/lib/python3.8/site-packages/tensorrt_llm/bindings.cpython-38-x86_64-linux-gnu.so(+0x5bbd6) [0x7fbc445e3bd6] 4 0x7fbc445ca6eb /opt/conda/lib/python3.8/site-packages/tensorrt_llm/bindings.cpython-38-x86_64-linux-gnu.so(+0x426eb) [0x7fbc445ca6eb] 5 0x4f5572 PyCFunction_Call + 82 6 0x4e0e1b _PyObject_MakeTpCall + 955 7 0x4dd0c6 _PyEval_EvalFrameDefault + 20406 8 0x4d70d1 _PyEval_EvalCodeWithName + 753 9 0x4e823c _PyFunction_Vectorcall + 412 10 0x4f53ae python() [0x4f53ae] 11 0x4f76ce PyObject_Call + 846 12 0x4da183 _PyEval_EvalFrameDefault + 8307 13 0x4d70d1 _PyEval_EvalCodeWithName + 753 14 0x4e823c _PyFunction_Vectorcall + 412 15 0x4d84a9 _PyEval_EvalFrameDefault + 921 16 0x4d70d1 _PyEval_EvalCodeWithName + 753 17 0x585e29 PyEval_EvalCodeEx + 57 18 0x585deb PyEval_EvalCode + 27 19 0x5a5bd1 python() [0x5a5bd1] 20 0x5a4bdf python() [0x5a4bdf] 21 0x45c538 python() [0x45c538] 22 0x45c0d9 PyRun_SimpleFileExFlags + 832 23 0x44fe8f python() [0x44fe8f] 24 0x579e89 Py_BytesMain + 57 25 0x7fbc6668c192 __libc_start_main + 242 26 0x579d3d python() [0x579d3d]

additional notes

N/A

NVIDIA / TensorRT-LLM