COXIT-CO / dont_trust_ai

0 stars 0 forks source link

Investigate `seed` and `top_p` parameters in OpenAI /chat/completion endpoint to minimise randomnes. #6

Open iramykytyn opened 2 weeks ago

iramykytyn commented 2 weeks ago

Try to minimize random in GPT responses with seed, or top_p parameters. Also try to monitor if system_fingerprint parameter changes and how often. Because if it changes when response changes it means that we deal with updated API backend, so probably it's not so random change in response, but LLM change which we should handle somehow differently.

https://platform.openai.com/docs/api-reference/chat/create

See if there are some similar parameters in Claude API.

Report results in comments or create an article.

eLQeR commented 1 week ago

Minimizing Randomness is described on Google Docs

Minimizing Randomness in GPT Responses: A Guide to Using Seeds, Top_p, and Monitoring System Fingerprints

When working with large language models (LLMs) like GPT, a major challenge is ensuring consistency in responses. These models are inherently stochastic, meaning randomness is embedded in their behavior. However, in certain applications, such as testing or content generation where consistency is crucial, it becomes essential to minimize this variability. In this article, we'll explore how to control randomness using parameters like seeds and top_p, and why monitoring the system_fingerprint parameter is critical. We'll also discuss why achieving full determinism in LLM responses is ultimately impossible.

1. Why Full Determinism Is Impossible

While setting a seed and controlling parameters like top_p can help reduce randomness, true determinism in LLM responses is impossible to achieve for several reasons:

2. Using a Seed to Control Randomness

One of the most effective ways to reduce variability is by setting a seed. This ensures that, for a given input, the model produces the same output each time. While this helps create repeatability, it’s important to remember that due to the non-deterministic nature of LLMs, even this won’t guarantee perfect consistency across different API calls or versions of the model (especially if the system_fingerprint changes).

3. Controlling Output Diversity with Top_p

The top_p parameter controls the probability distribution from which the model selects the next token. Setting top_p to a lower value restricts the model to a smaller pool of possible token choices, narrowing down the output variability.

However, lowering top_p too much can lead to unintended consequences:

4. Temperature and Top_p: Avoid Using Together

Both temperature and top_p control the randomness of the model’s responses, but in different ways:

According to the documentation of both GPT and Claude, using temperature and top_p together can lead to unpredictable behavior and poor performance. When both are set simultaneously, the combined effect may confuse the model, leading to degraded reasoning abilities and lower-quality outputs. For example, using both parameters might significantly hamper tasks that require detailed reasoning, like multi-step logic or chain of thought processes.

5. Additional Information on Temperature and n Parameter

When you set the parameter n and the GPT-powered chat generated three responses for a single input, the behavior was as follows:

Example with temp = 0, n = 3: Image

Example with temp = 0.15, n=3: Image

Unverified Theory : There is an unverified theory, which cannot be fully confirmed, that using OpenRouter introduces slightly more randomness than calling OpenAI directly. This may be due to the fact that OpenRouter uses different providers for LLM processing and different systems to handle heavy loads, whereas OpenAI may manage its infrastructure somewhat differently. This could contribute to additional variability in responses when using OpenRouter.

Conclusion

Minimizing randomness in GPT responses is crucial in scenarios where consistency is needed, but true determinism remains elusive due to the non-deterministic nature of the model and variability in the system_fingerprint. By carefully controlling temperature, seeds, and top_p settings, you can reduce randomness, though you must be cautious about over-constraining the model, as it may degrade performance in complex tasks. Monitoring system_fingerprint changes is also key to distinguishing random variation from API backend updates, helping you manage LLM behavior more effectively.

  1. References

The “seed” option for GPT does not increase the determinism level

ChatCompletions are not deterministic even with seed set, temperature=0, top_p=0, n=1

iramykytyn commented 4 days ago

left comments in article, please consider updating it