Consistent Output with Same Prompts

NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

Apache License 2.0

8.71k stars 996 forks source link

Description I ran the provided example script examples/llm-api/quickstart_example.py and observed that the output is consistent across multiple runs.

Steps to Reproduce

Run the quickstart_example.py script multiple times with default settings inside the TensorRT container nvcr.io/nvidia/tensorrt:24.10-py3 container.
Modify top_p and temperature values.
Run the script again multiple times.

Observed Behavior

With the default settings, the output remains the same for each run.
After changing top_p and temperature, a different output (let's call it "result B") is generated.
Subsequent runs with the modified settings consistently produce "result B."

Expected Behavior I'd like to understand how to generate non-deterministic results with fixed temperature and top_p settings.

Question How can I configure the model or script to produce non-deterministic outputs with fixed temperature and top_p?

NVIDIA / TensorRT-LLM

Consistent Output with Same Prompts #2411