ggerganov / llama.cpp

LLM inference in C/C++
MIT License
65.11k stars 9.33k forks source link

Bug: logit_bias Persists Across Requests When cache_prompt Is Enabled in llama.cpp Server #9477

Open jeanromainroy opened 5 days ago

jeanromainroy commented 5 days ago

What happened?

When using the llama.cpp server with cache_prompt enabled, I've encountered an issue where the logit_bias specified in one request persists and influences subsequent requests, even when those requests do not include any logit_bias. This results in unexpected, biased outputs in later requests, where the model continues to favor tokens from a previous logit_bias setting.

Expected Behavior:

Steps to Reproduce:

  1. Start the llama.cpp server with cache_prompt enabled.

  2. First Request with logit_bias:

    {
    "prompt": "Is the sky blue?\nAnswer with 'Yes', 'No', or 'N/A':",
    "max_tokens": 1,
    "logit_bias": [["Yes", 20], ["No", 20], ["N/A", 20]],
    "cache_prompt": true
    }
    • Expected Output: "Yes", "No", or "N/A".
    • Actual Output: "Yes" (as expected).
  3. Second Request without logit_bias:

    {
    "prompt": "Is the sky blue?",
    "max_tokens": 10,
    "cache_prompt": true
    }
    • Expected Output: An unbiased response based solely on the prompt.
    • Actual Output: The model outputs "Yes" or shows bias toward previous tokens, indicating lingering logit_bias.

Name and Version

./llama-cli --version version: 3733 (1b280614) built with Apple clang version 15.0.0 (clang-1500.3.9.4) for arm64-apple-darwin23.5.0

What operating system are you seeing the problem on?

No response

Relevant log output

No response

slaren commented 5 days ago

I cannot reproduce this. It is easy to test that the logit biases are applied in every request by giving a specific token a very high bias, effectively ensuring that it will be selected.