NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.11k stars 896 forks source link

Performance issues with TP and PP settings #1990

Open luoyang1999 opened 1 month ago

luoyang1999 commented 1 month ago

When I use 4 A10s and use PCIe to connect, I can only set TP=4 and PP=1 when I accelerate inference. If I use tp=2 and pp=2, the inference speed decreases significantly.This is an anti-intuitive phenomenon, generally speaking, when using the PCIe bus, the use of PP can reduce the communication consumption, and the model segmentation based on the Decoder structure is very neat, and its computational complexity should be consistent. It was observed that only the first GPU occupancy could reach 95%, and the subsequent GPU occupancy was very low. Ask if there may be some parameter setting issues, or if there are other possibilities

jinyangyuan-nvidia commented 1 month ago

Could you provide the commit you are using?

luoyang1999 commented 1 month ago

Could you provide the commit you are using?

I use the release version tags: [v0.9.0] with commit ID: [250d9c2], Is the new version updated with features related to parallel computing?

jinyangyuan-nvidia commented 1 month ago

Yes, there used to be a performance issue of PP + IFB (v0.9.0 has this issue). The issue has been fixed since the commit a96cccafcf6365c128f004f779160951f8c0801c in the main branch. Can you try to use the latest main branch and see whether the problem can be solved?

luoyang1999 commented 1 month ago

Thank you, I will try testing on the latest main branch and provide you with the test results later

github-actions[bot] commented 2 weeks ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."