abetlen / llama-cpp-python

Python bindings for llama.cpp
https://llama-cpp-python.readthedocs.io
MIT License
8.16k stars 970 forks source link

top_p = 1 causes deterministic outputs #1797

Open oobabooga opened 1 month ago

oobabooga commented 1 month ago

Setting top_p = 1 causes outputs to be identical even with a random seed. This was discovered by https://github.com/oobabooga/text-generation-webui/issues/6431#issuecomment-2409089861. See the full issue at https://github.com/oobabooga/text-generation-webui/issues/6431.

Reproduction

from llama_cpp import Llama

# Load the model
model = Llama(
    model_path="models/Meta-Llama-3-8B-Instruct-Q4_K_S-HF/Meta-Llama-3-8B-Instruct-Q4_K_S.gguf",
    n_gpu_layers=128,
)

# Define the prompt
prompt = "Once upon a time"

for i in range(5):
    # Generate text with temperature = 1
    completion = model.create_completion(prompt=prompt, max_tokens=50, temperature=1.0, top_p=1.0, seed=-1)

    # Print the generated text
    print(completion['choices'][0]['text'])

The 5 outputs will be identical.

Verified with llama-cpp-python==0.3.1.

jim-plus commented 4 weeks ago

For now, setting topP to 0.99 serves as a casual workaround.

m-from-space commented 4 weeks ago

The 5 outputs will be identical.

In your example you use seed=-1. Could you confirm, that when not using top_p=1.0 the 5 outputs will be different, but every time it will be the same 5 outputs? I opened an issue here, which was not present in llama-cpp-python==0.2.9