NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
7.34k stars 794 forks source link

How to test the time to new token of a model in Tensorrt-llm #1805

Open Ourspolaire1 opened 1 week ago

Ourspolaire1 commented 1 week ago

I found that in the benchmark/suite has the output time to first token. However, when I run python benchmark.py --model meta-llama/Llama-2-7b-hf static --isl 128 --osl 128 --batch 1 an error occurs:

mpirun was unable to launch the specified application as it could not access or execute an executable:

Executable: /app/tensorrt_llm/benchmarks/suite/tensorrt_llm_bench/../../../cpp/build/benchmarks/gptSessionBenchmark Node: 22f553c1f930

while attempting to start process rank 0.

How to solve it?Thanks

nv-guomingz commented 1 week ago

what's your tensorrt-llm version and your device cfg?

Ourspolaire1 commented 1 week ago

what's your tensorrt-llm version and your device cfg?

TensorRT-LLM version: 0.10.0.dev2024041600 System info: Ubuntu 22.04.3 LTS Driver Version: 555.42.02 Cuda compilation tools, release 12.3, V12.3.107 GPU:NVIDIA A10 Python 3.10.12 Thanks!

nv-guomingz commented 1 week ago

Is it possible to try the latest release https://pypi.org/project/tensorrt-llm/0.11.0.dev2024061800/? Your trt-llm version was released about 2 months ago.

Ourspolaire1 commented 1 week ago

Is it possible to try the latest release https://pypi.org/project/tensorrt-llm/0.11.0.dev2024061800/? Your trt-llm version was released about 2 months ago.

I used the tensorrt_llm-0.11.0.dev2024061800, but same error occurs

nv-guomingz commented 1 week ago

Is it possible to try the latest release https://pypi.org/project/tensorrt-llm/0.11.0.dev2024061800/? Your trt-llm version was released about 2 months ago.

I used the tensorrt_llm-0.11.0.dev2024061800, but same error occurs

Ok, we'll take a look next week since the team is on vacation from 6/21~6/22.

nv-guomingz commented 2 days ago

@Ourspolaire1 Just wanna to double confirm that u're using this cmd python benchmark.py --model meta-llama/Llama-2-7b-hf static --isl 128 --osl 128 --batch 1 and got the error mentioned above, right?

However, I got this error benchmark.py: error: argument -m/--model: invalid choice: 'meta-llama/Llama-2-7b-hf' (choose from 'whisper_large_v3', 'roberta_base', 'baichuan_7b', 't5_large', 'llama_70b_sq_per_tensor', 'llama_7b', 'recurrentgemma_2b', 'chatglm2_6b', 't5_small', 'starcoder2_3b', 'falcon_180b', 't5_11b', 'gpt_350m', 'baichuan_13b_chat', 'flan_t5_large', 'glm_10b', 'opt_2.7b', 'mamba_370m', 'gpt_350m_sq_per_tensor', 'bert_large', 'opt_350m', 'baichuan2_7b_chat', 'flan_t5_base', 'llama_30b', 'gptneox_20b', 'opt_66b', 'bart_large_cnn', 'gpt_next_2b', 'opt_30b', 'opt_6.7b', 'mamba_1.4b', 't5_base', 'flan_t5_xl', 'qwen1.5_7b_chat', 'flan_t5_small', 'falcon_rw_1b', 'gpt_1.5b', 'llama_70b_long_generation', 'internlm_chat_7b', 'chatglm_6b', 'mbart_large_50_many_to_one_mmt', 'bert_base', 'gpt_175b', 't5_3b', 'gptj_6b', 'baichuan2_13b_chat', 'qwen1.5_14b_chat', 'gpt_350m_moe', 'qwen_7b_chat', 'mamba_790m', 'falcon_7b', 'mixtral_8x7b', 'bloom_560m', 'bloom_176b', 'starcoder_15.5b', 'llama_70b', 'chatglm3_6b', 'mamba_130m', 'falcon_40b', 'qwen_14b_chat', 'llama_13b', 'mamba_2.8b', 'flan_t5_xxl', 'llama_70b_long_context', 'internlm_chat_20b', 'gpt_350m_sq_per_token_channel')

May I know the exactly model name u specified?

Ourspolaire1 commented 2 days ago

@Ourspolaire1 Just wanna to double confirm that u're using this cmd python benchmark.py --model meta-llama/Llama-2-7b-hf static --isl 128 --osl 128 --batch 1 and got the error mentioned above, right?

However, I got this error benchmark.py: error: argument -m/--model: invalid choice: 'meta-llama/Llama-2-7b-hf' (choose from 'whisper_large_v3', 'roberta_base', 'baichuan_7b', 't5_large', 'llama_70b_sq_per_tensor', 'llama_7b', 'recurrentgemma_2b', 'chatglm2_6b', 't5_small', 'starcoder2_3b', 'falcon_180b', 't5_11b', 'gpt_350m', 'baichuan_13b_chat', 'flan_t5_large', 'glm_10b', 'opt_2.7b', 'mamba_370m', 'gpt_350m_sq_per_tensor', 'bert_large', 'opt_350m', 'baichuan2_7b_chat', 'flan_t5_base', 'llama_30b', 'gptneox_20b', 'opt_66b', 'bart_large_cnn', 'gpt_next_2b', 'opt_30b', 'opt_6.7b', 'mamba_1.4b', 't5_base', 'flan_t5_xl', 'qwen1.5_7b_chat', 'flan_t5_small', 'falcon_rw_1b', 'gpt_1.5b', 'llama_70b_long_generation', 'internlm_chat_7b', 'chatglm_6b', 'mbart_large_50_many_to_one_mmt', 'bert_base', 'gpt_175b', 't5_3b', 'gptj_6b', 'baichuan2_13b_chat', 'qwen1.5_14b_chat', 'gpt_350m_moe', 'qwen_7b_chat', 'mamba_790m', 'falcon_7b', 'mixtral_8x7b', 'bloom_560m', 'bloom_176b', 'starcoder_15.5b', 'llama_70b', 'chatglm3_6b', 'mamba_130m', 'falcon_40b', 'qwen_14b_chat', 'llama_13b', 'mamba_2.8b', 'flan_t5_xxl', 'llama_70b_long_context', 'internlm_chat_20b', 'gpt_350m_sq_per_token_channel')

May I know the exactly model name u specified?

Is the benchmark.py in the folder suite/tensorrt_llm? By the way I want to test the time to new token of Llama3 and Qwen2. I was wondering how to test them. Thanks!