NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.67k stars 990 forks source link

Test Bert with original unittest file test_bert.py + position_ids as input goes wrong #412

Closed ehuaa closed 7 months ago

ehuaa commented 1 year ago

GPU:v100 cuda version: 12.2

Thanks for your great work. Now i wanted to deploy XLMRoberta with TensorRT-LLM, which is only has a tweak from the position_ids in bert_embeddings, so follow the issue i mentioned here, https://github.com/NVIDIA/TensorRT-LLM/issues/363. @byshiue suggested me to pass position_ids as an input array to the bert forward function.

So i simply modify the original unittest file test_bert.py and pass position_ids as an input array to check if it is ok. I make 3 tests below. 1) the original unittest for test_bert.py it works well.

2) pass real data to the original unittest In this test, i just use real data to replace the generated fake data, and modify hf_bert.forward function to use attention_masks for huggingface transformers model. the core modification is here.

tokenizer = AutoTokenizer.from_pretrained('BAAI/bge-reranker-large') sentence_pairs = [['what is panda?', 'hi'], ['what is panda?', 'The giant panda (Ailuropoda melanoleuca), sometimes called a panda bear or simply panda, is a bear species endemic to China.']] device_hf = torch.device("cuda") inputs_hf = tokenizer(sentence_pairs, padding=True, truncation=True, return_tensors='pt', max_length=512).to(device_hf) and the result is error. image

and the whole test_file is here. (just a test_bert.py, i cannot upload a single py file) test_bert_with_real_data.zip

3) pass user-specific position_ids as an input. 8 tests are all failed. almost 100% mismatch image

the core modification is here. from transformers.models.xlm_roberta.modeling_xlm_roberta import create_position_ids_from_input_ids 1700118369176 and the whole test_file is here. (just a test_bert.py, i cannot upload a single py file) test_bert_just_pass_position.zip

Can you help me and take a look at my problem? Looking forward to your replies, Thanks!

ehuaa commented 1 year ago

All the zip uploaded above can be tested directly under tests/model folder

byshiue commented 1 year ago

Here is document about how to debug. Hope it is helpful.

ehuaa commented 1 year ago

Here is document about how to debug. Hope it is helpful.

Thanks for your reply. I have checked this debug tutorial and found that after BertEmbedding layer, the result turns wrong. I wonder if embedding layer in bert does not support special position ids. (it's hard to debug with the Embedding layer cause it‘s in cpp…. @byshiue

byshiue commented 12 months ago

I am not sure what do you mean for "special positioin ids". But you can print the result of embedding and check that the embedding work well or not.

ehuaa commented 12 months ago

I am not sure what do you mean for "special positioin ids". But you can print the result of embedding and check that the embedding work well or not.

1700461731684 i mark the output after added position embedding here and it's not the same as huggingface transformer bert does. And special position ids refers to i pass an user-defined position_ids to this function above, which is calculated as image ,not the same as normal position_ids pre-defined in https://github.com/NVIDIA/TensorRT-LLM/blob/6837c8141acd036b8884330f4eadb50e097163f7/tensorrt_llm/models/bert/model.py#L48C11-L48C11

byshiue commented 12 months ago

You can try mark the results of self.vocab_embedding, self.position_embedding as output to check the correctness.

Arrivederci commented 11 months ago

@ehuaa Have you figured this out? I met the same problem.

byshiue commented 7 months ago

Close this bug because it is inactivated. Please reopen it if needed.