NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.62k stars 978 forks source link

server.cc:251] failed to enable peer access for some device pairs #1754

Open Godlovecui opened 5 months ago

Godlovecui commented 5 months ago

System Info

RTX 8*4090 version: TensorRT-LLM: v0.9.0 tensorrtllm_backend: v0.9.0

Who can help?

@kaiyux @BY

Information

Tasks

Reproduction

None

Expected behavior

None

actual behavior

None

additional notes

When I deploy llama3-8B in trition server, it raises below error: image but, it also print server launch successfully flag: image However, when I send requests to server, image image

How to fix it? Thank you~

nv-guomingz commented 5 months ago

Hi @Godlovecui , I saw u're using the 0.9.0 trtllm, is it possible to try the latest main branch and see if the issue still exists or not?

TheCodeWrangler commented 4 months ago

Have you tried

nvidia-smi topo -p2p r

To inspect if the drivers for your GPUS are installed and support the peer to peer access?

Also I have encounterd similar issues where my default GPU installation required me to compile with disabled on the use_custom_all_reduce flag

github-actions[bot] commented 3 months ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."