NVIDIA / NeMo-Guardrails

NeMo Guardrails is an open-source toolkit for easily adding programmable guardrails to LLM-based conversational systems.
Other
4.21k stars 400 forks source link

`llm_params` behavior can break on parallel requests #726

Open drazvan opened 2 months ago

drazvan commented 2 months ago

The pattern used in various places is:

with llm_params(llm, **params):
    result = await llm_call(llm, prompt)

However, if multiple parallel requests are made, and for example, one of them sets the max_tokens, it seems to affect the other parallel requests as well. We need to rethink/fix the underlying implementation.

Pouyanpi commented 1 week ago

Potential fixes: