Open Popsicle0-0 opened 3 months ago
You can see how QwenVL accelerates, and then you may understand
You can see how QwenVL accelerates, and then you may understand
Thank you for your response. Yes, I’ve looked into how Qwen-VL implements the prompt_table, but I’m not sure if this approach is suitable for all multimodal models. Additionally, why do different models have different ways of generating the prompt_table? Where can I find reference information on this?
def ptuning_setup(self, prompt_table, dtype, hidden_size, tasks, input_ids): if prompt_table is not None: task_vocab_size = torch.tensor([prompt_table.shape[1]], dtype=torch.int32, device="cuda") prompt_table = prompt_table.view( (prompt_table.shape[0] * prompt_table.shape[1], prompt_table.shape[2])) prompt_table = prompt_table.cuda().to( dtype=tensorrt_llm._utils.str_dtype_to_torch(dtype)) else: prompt_table = torch.empty([1, hidden_size]).cuda() task_vocab_size = torch.zeros([1]).cuda() if tasks is not None: tasks = torch.tensor([int(t) for t in tasks.split(',')], dtype=torch.int32, device="cuda") assert tasks.shape[0] == input_ids.shape[ 0], "Number of supplied tasks must match input batch size" else: tasks = torch.zeros([input_ids.size(0)], dtype=torch.int32).cuda() return [prompt_table, tasks, task_vocab_size]
I have the same feeling as you. https://github.com/NVIDIA/TensorRT-LLM/issues/2104
I have the same feeling as you. #2104
Could we have a short discussion? my email address is 1270660449@qq.com Thank you!
@Popsicle0-0
prompt_table
definition depends on the position of special <Image>
tokens in prompt, which is model-specific.
The idea is to split input ids into [pre_text_ids, prompt_table_ids, post_text_ids]
.
Some models skip either the pre_text
or post_text
component.
What conditions need to be met when using a prompt_table? I am trying to convert minicpm_llama3_v2.5, and if I have a custom method to merge input_id and vit, where should this logic be applied? I found that GenerationSession seems to only accept input_ids as input, and when I used input_embeds as the only input, various issues arose. I want to try using prompt_table but i donlt know where to combine input_ids and vit output, have any suggestions?