Open Onwaydbh opened 1 month ago
Same problem
We support several popular multimodal models in examples/multimodal/
.
For these models, we pass image embedding input to LLM via prompt_table
argument (this extends the embedding table of LLM) and modify input_ids
with indices into prompt_table
.
You can check tensorrt_llm/runtime/multimodal_model_runner.py
for how this mechanism is used for different models.
您发给我的信件已收到
such as,i want to use a Visual Pretrained Language Models to take the image embedding,and add it to llm input to get the output