TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[X] My own task or dataset (give details below)
Reproduction
In tensorrt_llm/models/enc_dec/model. The direct reference in py M2M100SinusoidalPositionalEmbedding, and in the EncDecEmbedding initialization, and then directly used in the forward, as shown below:
def forward(
self,
input_ids,
position_ids=None,
token_type_ids=None,
prompt_embedding_table=None,
prompt_tasks=None,
prompt_vocab_size=None,
):
# position_ids and token_type_ids are provided inputs
# and should not be formulated determinisitically
ptuning_args = []
if self.use_prompt_tuning:
ptuning_args = [prompt_embedding_table, prompt_tasks, prompt_vocab_size]
x = self.vocab_embedding(input_ids, *ptuning_args) * self.embedding_scale
self.register_network_output("word_embeddings", x)
embed_pos = self.embed_positions(input_ids, x)
embed_pos = embed_pos.to(x.device)
hidden_states = x + embed_pos
# if self.position_embedding:
# pos_emb = self.position_embedding(position_ids)
# self.register_network_output("position_embeddings", pos_emb)
# x = x + pos_emb
# if self.token_type_embedding:
# x = x + self.token_type_embedding(token_type_ids)
# if self.embedding_layernorm:
# x = self.embedding_layernorm(x)
return hidden_states
Expected behavior
build successfully
actual behavior
File "/root/miniconda3/lib/python3.10/site-packages/transformers/models/m2m_100/modeling_m2m_100.py", line 162, in forward
bsz, seq_len = input_ids.size()
ValueError: not enough values to unpack (expected 2, got 1)
additional notes
I think I've made it clear that if you want to replicate it completely, I can upload the code I'm currently using to github
System Info
Hello, I'm now in TRT - LLM adapter m2m100, but m2m100 USES is SinusoidalPositionalEmbedding, what should I do to make it work。 https://github.com/huggingface/transformers/blob/main/src/transformers/models/m2m_100/modeling_m2m_100.py#L86
Who can help?
@ncomly-nvidia
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
In tensorrt_llm/models/enc_dec/model. The direct reference in py M2M100SinusoidalPositionalEmbedding, and in the EncDecEmbedding initialization, and then directly used in the forward, as shown below:
Expected behavior
build successfully
actual behavior
File "/root/miniconda3/lib/python3.10/site-packages/transformers/models/m2m_100/modeling_m2m_100.py", line 162, in forward bsz, seq_len = input_ids.size() ValueError: not enough values to unpack (expected 2, got 1)
additional notes
I think I've made it clear that if you want to replicate it completely, I can upload the code I'm currently using to github