NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.29k stars 925 forks source link

How can i customize position_ids for my own model? #1797

Closed littletomatodonkey closed 3 months ago

littletomatodonkey commented 3 months ago

Hi, in my model, some of the position ids are same(which are all vision tokens). For example, for an input with seq_length as 5, 1~3 ids are vision tokens. The common position ids are

[0, 1, 2, 3, 4]

But for me, i want to set it as follows. (All the vision tokens are same)

[0, 1, 1, 1, 2]

Is there any reference in TRT-LLM to finish it with gpt_attention_plugin? Thanks!

hijkzzz commented 3 months ago

Please refer to the CogVLM: https://github.com/NVIDIA/TensorRT-LLM/blob/2a115dae84f13daaa54727534daa837c534eceb4/tensorrt_llm/layers/attention.py#L1469

littletomatodonkey commented 3 months ago

Please refer to the CogVLM:

https://github.com/NVIDIA/TensorRT-LLM/blob/2a115dae84f13daaa54727534daa837c534eceb4/tensorrt_llm/layers/attention.py#L1469

All position ids in cogvlm are fixed, and actually in TRT-LLM cogvlm realization, position_ids is a dead input tensor for gpt_attention, do you know which case can be refered for input position_ids? Thanks!

littletomatodonkey commented 3 months ago

i solved it by realizing ROPE myself after qkv calculation and set rope_embedding_type as None in gpt_attention_plugin.

avianion commented 2 months ago

@littletomatodonkey please share your code. also running into the same issue.

littletomatodonkey commented 2 months ago

You can refer to chatglm model build process. There are 3 steps

  1. compute position embedding by yourself using input position id (attention.py)
  2. in gpt_attention plugin, set rope type as none for your model type.
  3. transfer position id into generation.py

I use python runtime in TRT-LLM 071 as in newer version, it's hard to hack code in the gptmanager.