Open nhs-work opened 1 day ago
Based on preliminary investigations and the logs, it appears as though the code flow is as follows (feel free to correct if I'm incorrect):
Request to litellm:
log line)Set Async Redis Cache: key list:
https://github.com/BerriAI/litellm/blob/e19bb55e3b4c6a858b6e364302ebbf6633a51de5/litellm/caching.py#L471It seems like the ttl configs are not correctly being passed down?
What happened?
LiteLLM versions tested:
main-v1.40.9-stable
,main-v1.44.22-stable
,main-v1.48.7-stable
Set up:
What is observed?
Calling LiteLLM via the following curl command correctly sets the TTL to be 600 based on the configs above:
However the following curl command (note that input is now an array) sets the ttl as
-1
:Note that the issue occurs for all valid non-string values for
input
, which are:List[str] | List[int] | List[List[int]]
based on openai.Why is this an issue?
Users using Langchain might call LiteLLM via
OpenAIEmbeddings
which would provide an array of tokens (int), ref. This would result in the cache being filled with key value pairs with no expiry causing the cache to run out of space.I have also noticed that when redis is out of space in such a manner, the litellm pods tend to be stuck and restart whenever there are multiple calls being made.
Relevant log output
Twitter / LinkedIn details
No response