NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.5k stars 962 forks source link

hang up using mpirun -n 2 #2337

Open Hukongtao opened 1 week ago

Hukongtao commented 1 week ago

System Info

Who can help?

No response

Information

Tasks

Reproduction


set -ex

pip3 install --no-cache-dir tensorrt_llm -U --pre --extra-index-url https://pypi.nvidia.com

filename=$(basename ${HDFS_MODEL_PATH})
num_devices_per_group=${NUM_DEVICES_PER_GROUP}

python3 convert_checkpoint.py \
    --model_dir ./${filename}/ \
    --output_dir ./tllm_checkpoint_gptq \
    --dtype float16 \
    --use_weight_only \
    --weight_only_precision int4_gptq \
    --per_group \
    --tp_size ${num_devices_per_group}

# max_batch_size:       Maximum number of requests that the engine can schedule
# max_input_len:        Maximum input length of one request
# max_seq_len:          Maximum total length of one request, including prompt and outputs
# max_beam_width:       Maximum number of beams for beam search decoding
# max_num_tokens:       Maximum number of batched input tokens after padding is removed in each batch
python3 build_engine.py \
    --checkpoint_dir ./tllm_checkpoint_gptq \
    --output_dir ./tmp/trt_engines/int4_GPTQ/ \
    --gemm_plugin float16 \
    --max_batch_size 1 \
    --max_input_len 8196 \
    --max_seq_len 8197 \
    --max_beam_width 1 \
    --max_num_tokens 8197 \
    --kv_cache_type disabled

mpirun -n 2 python3 handler_trt.py

Image

The problem occurs in this sentence: MpiComm.local_init()

Expected behavior

run successfully

actual behavior

hang up

additional notes

xxx

Superjomn commented 1 week ago

Can you provide the content of the handler_trt.py? Please give a simple code sample related to TensorRT-LLM for reproduction.

Hukongtao commented 1 week ago

Can you provide the content of the handler_trt.py? Please give a simple code sample related to TensorRT-LLM for reproduction.

I found that this error has nothing to do with handler_trt.py. I installed trt-llm in the following way and got the error.

pip3 install --no-cache-dir tensorrt_llm -U --pre --extra-index-url https://pypi.nvidia.com

But when I installed trt-llm from source, the error disappeared

git clone https://github.com/NVIDIA/TensorRT-LLM TensorRT-LLM-py39
git checkout 75057cd036af25e288c004d8ac9e52fd2d6224aa
git submodule update --init --recursive
git lfs install --force
git lfs pull
python3 ./scripts/build_wheel.py --trt_root /usr/local/tensorrt/ -c
Superjomn commented 1 week ago

I queried the information for the commit 75057cd036af25e288c004d8ac9e52fd2d6224aa, it has the following information:

commit 75057cd036af25e288c004d8ac9e52fd2d6224aa (HEAD -> main, origin/main, origin/HEAD)
Author: Kaiyu Xie <26294424+kaiyux@users.noreply.github.com>
Date:   Tue Oct 15 15:28:40 2024 +0800

So it is the latest bi-weekly release.

Can you share the version of the tensorrt_llm pip installed? Just get the input of import tensorrt_llm; print(tensorrt_llm.__version__).

cc @kaiyux for viz.