Bug: logit_bias Persists Across Requests When cache_prompt Is Enabled in llama.cpp Server

What happened?

When using the llama.cpp server with cache_prompt enabled, I've encountered an issue where the logit_bias specified in one request persists and influences subsequent requests, even when those requests do not include any logit_bias. This results in unexpected, biased outputs in later requests, where the model continues to favor tokens from a previous logit_bias setting.

Expected Behavior:

logit_bias specified in one request should not affect others.
Enabling cache_prompt should not cause parameters like logit_bias to carry over between requests.

Steps to Reproduce:

Start the llama.cpp server with cache_prompt enabled.

First Request with logit_bias:

{
"prompt": "Is the sky blue?\nAnswer with 'Yes', 'No', or 'N/A':",
"max_tokens": 1,
"logit_bias": [["Yes", 20], ["No", 20], ["N/A", 20]],
"cache_prompt": true
}

Expected Output: "Yes", "No", or "N/A".
Actual Output: "Yes" (as expected).

Second Request without logit_bias:
```
{
"prompt": "Is the sky blue?",
"max_tokens": 10,
"cache_prompt": true
}
```
- Expected Output: An unbiased response based solely on the prompt.
- Actual Output: The model outputs "Yes" or shows bias toward previous tokens, indicating lingering logit_bias.

Name and Version

./llama-cli --version version: 3733 (1b280614) built with Apple clang version 15.0.0 (clang-1500.3.9.4) for arm64-apple-darwin23.5.0

What operating system are you seeing the problem on?

No response

Relevant log output

No response

ggerganov / llama.cpp