Closed ehuaa closed 1 year ago
The forward function in TensorRT LLM is only used to define the network, it is not really used in inference.
A solution is prepare a position table for all stpes as an input of network, and then gather the values you want in network building.
The forward function in TensorRT LLM is only used to define the network, it is not really used in inference.
A solution is prepare a position table for all stpes as an input of network, and then gather the values you want in network building.
Thanks for your quick reply, so the better solution for me is to modify the run.py in examples/bert folder, and insert the position_ids in the inputs dict here with my specific method of RoBERTa? @byshiue
I think so. You can also implement the function by the API we provide or implement a plugin to support the feature you want.
If you want to add a new input as you describe, you also need to modify model.py
.
I think so. You can also implement the function by the API we provide or implement a plugin to support the feature you want. If you want to add a new input as you describe, you also need to modify
model.py
.
Yes, and the model.py in bert folder shows that position_ids is already in the forward function as an input, so i think it's enough for my need to implement this function, i'll modify and test it later.
i have another problem here, i see bert_attention function has not input field for attention_mask but the tokenizer of bert or xlmroberta will return input_ids and attention_mask as well. I wonder if bert_attention has implemented attention_mask already or it just ignore it? Thanks @byshiue
You mention the plugin case, so it uses input_lengths
directly instead of reading attention mask, like what I explain above.
You mention the plugin case, so it uses
input_lengths
directly instead of reading attention mask, like what I explain above.
Thanks for your quick reply. I figure out how the input_lengths work out.
I'm working on deploy huggingface roberta model to TensorRT-LLM, which has a little tweak from Bert with the embeddings. In RobertaEmbedding the position_ids is calculated as follows: I need to transfer input_ids to torch.Tensor to implement this function above, can you tell me the api to achieve this or how can i write code decently to implement this function, thanks!