NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.67k stars 990 forks source link

How to transfer a Tensor type object to torch.Tensor #363

Closed ehuaa closed 1 year ago

ehuaa commented 1 year ago

I'm working on deploy huggingface roberta model to TensorRT-LLM, which has a little tweak from Bert with the embeddings. In RobertaEmbedding the position_ids is calculated as follows: image image I need to transfer input_ids to torch.Tensor to implement this function above, can you tell me the api to achieve this or how can i write code decently to implement this function, thanks!

byshiue commented 1 year ago

The forward function in TensorRT LLM is only used to define the network, it is not really used in inference.

A solution is prepare a position table for all stpes as an input of network, and then gather the values you want in network building.

ehuaa commented 1 year ago

The forward function in TensorRT LLM is only used to define the network, it is not really used in inference.

A solution is prepare a position table for all stpes as an input of network, and then gather the values you want in network building.

Thanks for your quick reply, so the better solution for me is to modify the run.py in examples/bert folder, and insert the position_ids in the inputs dict here with my specific method of RoBERTa? 1699866411195 @byshiue

byshiue commented 1 year ago

I think so. You can also implement the function by the API we provide or implement a plugin to support the feature you want. If you want to add a new input as you describe, you also need to modify model.py.

ehuaa commented 1 year ago

I think so. You can also implement the function by the API we provide or implement a plugin to support the feature you want. If you want to add a new input as you describe, you also need to modify model.py.

Yes, and the model.py in bert folder shows that position_ids is already in the forward function as an input, so i think it's enough for my need to implement this function, i'll modify and test it later.

ehuaa commented 1 year ago

i have another problem here, i see bert_attention function has not input field for attention_mask 1699932886000 but the tokenizer of bert or xlmroberta will return input_ids and attention_mask as well. I wonder if bert_attention has implemented attention_mask already or it just ignore it? Thanks @byshiue

byshiue commented 1 year ago

You mention the plugin case, so it uses input_lengths directly instead of reading attention mask, like what I explain above.

ehuaa commented 1 year ago

You mention the plugin case, so it uses input_lengths directly instead of reading attention mask, like what I explain above.

Thanks for your quick reply. I figure out how the input_lengths work out.