NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.61k stars 979 forks source link

Using TensortLLM/TensorRT to compile custom model based on Mistral LLM and ViT image encoder #1401

Closed anjali-chadha closed 5 months ago

anjali-chadha commented 7 months ago

I have a model that combines two components:

  1. Image Encoder: Based on the ViT-G/14 vision transformer model.
  2. Language Model: A Mistral-based large language model (LLM).

At a higher level, the output from the Image Encoder is processed, concatenated with other tokens, and then fed into the Mistral LLM model.

In my current implementation, I have a single class that initializes these models and performs a forward pass on them.

For running inference on this model, I'm exploring TensorRT and TensorRT-LLM. It seems like these components can be individually compiled—Mistral is supported in TensorRT-LLM, and ViT-G can be compiled using TensorRT.

My question is: How can I leverage both TensorRT and TensorRT-LLM to run inference on this custom vision-language architecture? Specifically:

Is it possible to compile and optimize the two components (ViT-G and Mistral) separately using their respective tools (TensorRT and TensorRT-LLM)? If so, how can I combine the optimized components during inference to run the entire vision-language model pipeline efficiently? Any guidance or examples on this would be greatly appreciated. Thank you!

byshiue commented 7 months ago

You can refer the examples of multimodal here.

github-actions[bot] commented 5 months ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."

github-actions[bot] commented 5 months ago

This issue was closed because it has been stalled for 15 days with no activity.