NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.61k stars 979 forks source link

The engine generated by each build has different results for the same input. #2148

Closed 1096125073 closed 2 weeks ago

1096125073 commented 2 months ago

System Info

trt-llm v0.9.0

Who can help?

@byshiue

Information

Tasks

Reproduction

  1. build the engine for test 1
  2. build the engine for test 2
  3. run above 2 engine use the same input

Expected behavior

the ouputs is same

actual behavior

the ouputs is not same

additional notes

I tried to use the model cache when building, but it didn't work.

1096125073 commented 2 months ago

Is there any way to ensure that the engine generated by build is identical?This is important for engineering deployment.

lfr-0531 commented 2 months ago

Can you provide more details, i.e the cmds, which can help us reproduce this issue?

qiancheng99 commented 2 months ago

Can you provide more details, i.e the cmds, which can help us reproduce this issue?

I encountered same issue. I trtllm-build 2 times with everything identical, but the inference results are slightly different between 2 models for the same input. Also I found a similar problem post by others. https://github.com/NVIDIA/TensorRT-LLM/issues/2196

github-actions[bot] commented 1 month ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."

github-actions[bot] commented 2 weeks ago

This issue was closed because it has been stalled for 15 days with no activity.