NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.18k stars 908 forks source link

C++ example code for GPT model inference #2142

Open AmazDeng opened 3 weeks ago

AmazDeng commented 3 weeks ago

Could tensorrt-llm provide a C++ example code for GPT model inference? I noticed that the official examples are all in Python. Could you provide a C++ version? @kaiyux @Shixiaowei02 @nv-guomingz

zhangts20 commented 2 weeks ago

@AmazDeng You can find them in https://github.com/NVIDIA/TensorRT-LLM/tree/v0.11.0/examples/cpp/executor

AmazDeng commented 2 weeks ago

@AmazDeng You can find them in https://github.com/NVIDIA/TensorRT-LLM/tree/v0.11.0/examples/cpp/executor

Thank you. I'll check it out later.

AmazDeng commented 1 week ago

@AmazDeng You can find them in https://github.com/NVIDIA/TensorRT-LLM/tree/v0.11.0/examples/cpp/executor

Thank you. I'll check it out later.

@zhangts20 Sorry, I may not have clearly expressed my needs. What I am looking for is the cpp code for a multimodal model, one that takes both image and text inputs, like llava, blip2. What you provided was the cpp code for a GPT model that only takes text input.

zhangts20 commented 1 week ago

@AmazDeng You can find them in https://github.com/NVIDIA/TensorRT-LLM/tree/v0.11.0/examples/cpp/executor

Thank you. I'll check it out later.

@zhangts20 Sorry, I may not have clearly expressed my needs. What I am looking for is the cpp code for a multimodal model, one that takes both image and text inputs, like llava, blip2. What you provided was the cpp code for a GPT model that only takes text input.

I am also seeking for this, but there has been no progress

lfr-0531 commented 1 week ago

What you provided was the cpp code for a GPT model that only takes text input.

@zhangts20 @AmazDeng Now TensorRT-LLM can only support multimodal models with Python runtime and will have cpp runtime support later.