NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.78k stars 1.01k forks source link

How to use multi-gpu in running llava? #1003

Open xiaocaoxu opened 10 months ago

xiaocaoxu commented 10 months ago

System Info

GPU:3090 CUDA:12.2

Who can help?

@ncomly-nvidia @symphonylyh

Information

Tasks

Reproduction

build llm engine: python ../llama/build.py --model_dir /models/llava-llama-2-finetune_full-mmcm-2023-11-01-03-15-20/ --dtype float32 --remove_input_padding --use_gpt_attention_plugin float32 --enable_context_fmha --use_gemm_plugin float32 --output_dir /models/llava_trt/1.0/fp32/2-gpu/ --max_batch_size 1 --world_size 2 --tp_size 2 --max_prompt_embedding_table_size 576 build visual engine: python build_visual_engine.py --model_name llava-v1.5-7b --model_path /models/llava-llama-2-finetune_full-mmcm-2023-11-01-03-15-20 run: mpirun -n 2 --allow-run-as-root python run.py --max_new_tokens 512 --input_text "Question: which city is this? Answer:" --hf_model_dir /models/llava-llama-2-finetune_full-mmcm-2023-11-01-03-15-20 --visual_engine_dir visual_engines/llava-v1.5-7b --llm_engine_dir /models/llava_trt/1.0/fp32/2-gpu --decoder_llm

get error: image

Expected behavior

LLaVA use multi-gpu

actual behavior

error

additional notes

None

kaiyux commented 9 months ago

Hi @xiaocaoxu , we pushed an update to the main branch which should contain the fix of this issue, can you please verify it on the latest main branch? Thank you.

nv-guomingz commented 1 week ago

Hi @xiaocaoxu do u still have further issue or question now? If not, we'll close it soon.