with llm_params(llm, **params):
result = await llm_call(llm, prompt)
However, if multiple parallel requests are made, and for example, one of them sets the max_tokens, it seems to affect the other parallel requests as well. We need to rethink/fix the underlying implementation.
Instance Isolation: requests are sharing one instance of llm?
Thread or Task local context
Refactor llm_params to avoid directly modifying shared state. For example, instead of modifying llm directly, llm_params could return a modified copy of llm or apply parameters only within the scope of each function call.
The pattern used in various places is:
However, if multiple parallel requests are made, and for example, one of them sets the
max_tokens
, it seems to affect the other parallel requests as well. We need to rethink/fix the underlying implementation.