abetlen / llama-cpp-python

Python bindings for llama.cpp
https://llama-cpp-python.readthedocs.io
MIT License
8.14k stars 969 forks source link

Setting seed to -1 (random) or using default LLAMA_DEFAULT_SEED generates a deterministic reply chain #1809

Open m-from-space opened 3 weeks ago

m-from-space commented 3 weeks ago

INB4 This is not about setting Top-P to 1.0 which causes the same output every time for every same prompt, documented here: https://github.com/abetlen/llama-cpp-python/issues/1797

When loading a model with release 0.3.1 and set the seed to either -1 (random) or leave the default (which is supposed to use a RNG according to the docs), the first reply of a model will always be the same for the same prompt. Consecutive replies for the same prompt will be different, but the chain will stay the same for each time loading the model and repeating the steps.

This points towards the seed not being randomized on load.

What I expect and what worked earlier: Loading a model with a random seed will generate a different first reply for the same prompt.

The issue is not present in version llama-cpp-python==0.2.90

m-from-space commented 2 days ago

Problem is still present in llama-cpp-python==0.3.2