NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.71k stars 996 forks source link

Consistent Output with Same Prompts #2411

Open ZhenboYan opened 2 weeks ago

ZhenboYan commented 2 weeks ago

Description I ran the provided example script examples/llm-api/quickstart_example.py and observed that the output is consistent across multiple runs.

Steps to Reproduce

  1. Run the quickstart_example.py script multiple times with default settings inside the TensorRT container nvcr.io/nvidia/tensorrt:24.10-py3 container.
  2. Modify top_p and temperature values.
  3. Run the script again multiple times.

Observed Behavior

Expected Behavior I'd like to understand how to generate non-deterministic results with fixed temperature and top_p settings.

Question How can I configure the model or script to produce non-deterministic outputs with fixed temperature and top_p?

syuoni commented 2 weeks ago

Hi @ZhenboYan ,

If the random seed is not provided to SamplingParams, TRT-LLM will generate a default and fixed random seed for sampling. That's why you observed the generation outputs unchanged across different runs. So, we need to provide different random seeds to get different sampling outputs. Thanks!