can i add images embedding to llm input? How can i do it？

NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

https://nvidia.github.io/TensorRT-LLM

Apache License 2.0

8.19k stars 908 forks source link

can i add images embedding to llm input? How can i do it？ #2055

Open Onwaydbh opened 1 month ago

Onwaydbh commented 1 month ago

such as，i want to use a Visual Pretrained Language Models to take the image embedding,and add it to llm input to get the output

Popsicle0-0 commented 1 month ago

Same problem

amukkara commented 1 week ago

We support several popular multimodal models in examples/multimodal/.

For these models, we pass image embedding input to LLM via prompt_table argument (this extends the embedding table of LLM) and modify input_ids with indices into prompt_table.

You can check tensorrt_llm/runtime/multimodal_model_runner.py for how this mechanism is used for different models.

Onwaydbh commented 1 week ago

您发给我的信件已收到