NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.55k stars 969 forks source link

support llava-next model #1900

Open AmazDeng opened 3 months ago

AmazDeng commented 3 months ago

Llama-next-image and Llama-next-image are fairly good multimodal models, and they are already supported in transformers. I would like to know if tensorrt-llm plans to support these two models?

https://github.com/LLaVA-VL/LLaVA-NeXT https://huggingface.co/docs/transformers/model_doc/llava_next https://huggingface.co/docs/transformers/model_doc/llava-next-video

AdamzNV commented 1 day ago

As more and more new models enter the market, we have prepared comprehensive instructions for TRT-LLM developers on adapting to new models of interest. We encourage our community developers to expand the range of supported models, fostering an open ecosystem with rapid iterations.

Please try following these instructions and let us know if you encounter any issues during the adaptation process. We greatly appreciate your dedication.