NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.62k stars 979 forks source link

Error: MPI worldSize is expected to be equal to tp*pp when participantIds are not specified #2058

Open ydm-amazon opened 3 months ago

ydm-amazon commented 3 months ago

System Info

AWS g5.12xlarge (GPU 4 x NVIDIA A10G) CPU x86_64 TensorRT-LLM v0.11.0

Who can help?

@byshiue

Information

Tasks

Reproduction

As noted below, I am using TP 4 and PP 1 (hence, world size of 4). The model that fails is Gemma-7B.

python3 convert_checkpoint.py --dtype float16 --world-size 4 --model-dir /tmp/.djl.ai/download/37ada4b27fdc4767ab3c7677e281a42fbdb95a90 --output-model-dir /tmp/trtllm_gemma_ckpt/ --ckpt-type hf
trtllm-build --tp_size 4 --pp_size 1 --checkpoint_dir /tmp/trtllm_gemma_ckpt/ --log_level info --gemm_plugin float16 --output_dir /tmp/.djl.ai/trtllm/194114b4749e0bcd195044eff1e74c328b917953/37ada4b27fdc4767ab3c7677e281a42fbdb95a90/1 --workers 4 --gpt_attention_plugin float16 --paged_kv_cache enable --context_fmha enable --max_beam_width 1 --remove_input_padding enable --use_custom_all_reduce disable --use_paged_context_fmha enable --use_fp8_context_fmha disable --max_batch_size 16 --max_input_len 1024 --max_seq_len 1024 --use_fused_mlp

Expected behavior

MPI should not error out, since world size is consistent with TP and PP.

actual behavior

The software gives an error of the following:

Assertion failed: With communicationMode kLEADER, MPI worldSize is expected to be equal to tp*pp when participantIds are not specified (/home/jenkins/agent/workspace/LLM/release-0.11/L0_PostMerge/llm/cpp/tensorrt_llm/executor/executorImpl.cpp:435)

additional notes

A similar issue is mentioned in https://github.com/NVIDIA/TensorRT-LLM/issues/2021 but that one is marked as 'not a bug' so I created a new issue. The answer in that issue does not apply to this one.

LanceB57 commented 3 months ago

Maybe you have to add --tp_size 4 to your convert_checkpoint.py command?

What command causes the error to occur? If you're running the model, are you using mpirun -n 4 ... or otherwise specifying the world size should be 4?

ydm-amazon commented 3 months ago

This is the gemma model, where there is no --tp_size option. I am also specifying the right world size for mpirun

lanking520 commented 3 months ago

I am also able to reproduce the issue

Conversion script:

python3 /usr/local/lib/python3.10/dist-packages/tensorrt_llm_toolkit/build_scripts/gemma/convert_checkpoint.py --dtype float16 --world-size 4 --model-dir google/gemma-7b --output-model-dir /tmp/trtllm_gemma_ckpt/ --ckpt-type hf

After model conversions, I can clearly see 4 ranks of safetensors saved

# ls  /tmp/trtllm_gemma_ckpt/
config.json  rank0.safetensors  rank1.safetensors  rank2.safetensors  rank3.safetensors

During the engine build phase

# trtllm-build --tp_size 4 --checkpoint_dir /tmp/trtllm_gemma_ckpt/ --log_level info --gemm_plugin float16 --output_dir /tmp/.djl.ai/trtllm/4647f76cc28bee0fdd3f41d68a8620656f876497/google-gemma-7b/1 --workers 1 --gpt_attention_plugin float16 --paged_kv_cache enable --context_fmha enable --max_beam_width 1 --remove_input_padding enable --use_custom_all_reduce disable --use_paged_context_fmha enable --use_fp8_context_fmha disable --max_batch_size 16 --max_input_len 1024 --max_seq_len 1024 --use_fused_mlp

It only generate 1 rank engine file

/tmp/.djl.ai/trtllm/4647f76cc28bee0fdd3f41d68a8620656f876497/google-gemma-7b/1/
config.json  rank0.engine
symphonylyh commented 3 months ago

@ydm-amazon @lanking520 @LanceB57 Gemma TP has a problem, but we have fixed it internally and will release in next Tuesday's weekly update. The issue was that some functions were reading TP information from Mapping, which was not set correctly.

github-actions[bot] commented 2 months ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."

github-actions[bot] commented 1 month ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."