NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.72k stars 996 forks source link

SinusoidalPositionalEmbedding #1099

Open dingjingzhen opened 9 months ago

dingjingzhen commented 9 months ago

System Info

Hello, I'm now in TRT - LLM adapter m2m100, but m2m100 USES is SinusoidalPositionalEmbedding, what should I do to make it work。 https://github.com/huggingface/transformers/blob/main/src/transformers/models/m2m_100/modeling_m2m_100.py#L86

Who can help?

@ncomly-nvidia

Information

Tasks

Reproduction

In tensorrt_llm/models/enc_dec/model. The direct reference in py M2M100SinusoidalPositionalEmbedding, and in the EncDecEmbedding initialization, and then directly used in the forward, as shown below:

def forward(
        self,
        input_ids,
        position_ids=None,
        token_type_ids=None,
        prompt_embedding_table=None,
        prompt_tasks=None,
        prompt_vocab_size=None,
    ):
        # position_ids and token_type_ids are provided inputs
        # and should not be formulated determinisitically
        ptuning_args = []
        if self.use_prompt_tuning:
            ptuning_args = [prompt_embedding_table, prompt_tasks, prompt_vocab_size]
        x = self.vocab_embedding(input_ids, *ptuning_args) * self.embedding_scale
        self.register_network_output("word_embeddings", x)
        embed_pos = self.embed_positions(input_ids, x)
        embed_pos = embed_pos.to(x.device)
        hidden_states = x + embed_pos

        # if self.position_embedding:
        #     pos_emb = self.position_embedding(position_ids)
        #     self.register_network_output("position_embeddings", pos_emb)
        #     x = x + pos_emb
        # if self.token_type_embedding:
        #     x = x + self.token_type_embedding(token_type_ids)

        # if self.embedding_layernorm:
        #     x = self.embedding_layernorm(x)

        return hidden_states

Expected behavior

build successfully

actual behavior

File "/root/miniconda3/lib/python3.10/site-packages/transformers/models/m2m_100/modeling_m2m_100.py", line 162, in forward bsz, seq_len = input_ids.size() ValueError: not enough values to unpack (expected 2, got 1)

additional notes

I think I've made it clear that if you want to replicate it completely, I can upload the code I'm currently using to github

hello-11 commented 1 week ago

@dingjingzhen Do you still have the problem? If not, we will close it soon.