[Bug]: TTL not being set for cached batch requests

What happened?

LiteLLM versions tested: main-v1.40.9-stable, main-v1.44.22-stable, main-v1.48.7-stable

Set up:

Helm chart installation on AWS EKS
2 replica proxy with the following relevant caching configurations:

  litellm_settings:
    cache: true
    cache_params:
      type: redis
      ttl: 600
      default_in_memory_ttl: 600
      default_in_redis_ttl: 600
    json_logs: true
    set_verbose: true
  router_settings:
    routing_strategy: simple-shuffle
    enable_pre_call_checks: true

What is observed?

Calling LiteLLM via the following curl command correctly sets the TTL to be 600 based on the configs above:

curl --location 'https://litellm.com/openai/deployments/text-embedding-3-small/embeddings' \
    -A "Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/81.0" \
    --header 'Content-Type: application/json' \
    -H "Authorization: Bearer sk-xxx" \
    --data '{
    "model": "text-embedding-3-small",
    "input": "embed this text"
}'

However the following curl command (note that input is now an array) sets the ttl as -1:

curl --location 'https://litellm.com/openai/deployments/text-embedding-3-small/embeddings' \
    -A "Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/81.0" \
    --header 'Content-Type: application/json' \
    -H "Authorization: Bearer sk-xxx" \
    --data '{
    "model": "text-embedding-3-small",
    "input": ["embed this text"]
}'

Note that the issue occurs for all valid non-string values for input, which are: List[str] | List[int] | List[List[int]] based on openai.

Why is this an issue?

Users using Langchain might call LiteLLM via OpenAIEmbeddings which would provide an array of tokens (int), ref. This would result in the cache being filled with key value pairs with no expiry causing the cache to run out of space.

I have also noticed that when redis is out of space in such a manner, the litellm pods tend to be stuck and restart whenever there are multiple calls being made.

Relevant log output

Sanitized and summarized logs from main-v1.44.22-stable:

Request to litellm:

litellm.aembedding(api_key='xxx', api_base='https://xxx.openai.azure.com/', api_version='2024-06-01', model='azure/text-embedding-3-small', input=["input"], caching=True, client=<openai.lib.azure.AsyncAzureOpenAI object at 0x7f78a7654ed0>, encoding_format='base64', proxy_server_request={'url': 'http://litellm.monitoring:4000/embeddings', 'method': 'POST', 'headers': {'host': 'litellm.monitoring:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'OpenAI/Python 1.3.9', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.3.9', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.10.12', 'authorization': '', 'x-stainless-async': 'false', 'content-length': '175'}, 'body': {'input': [[791, 432, 1929, 70910, 374, 459, 10309, 5507, 430, 8720, 288, 701, 11164, 596, 13708, 13]], 'model': 'text-embedding-3-small', 'encoding_format': 'base64'}}, metadata={'user_api_key': '', 'user_api_key_alias': '','user_api_end_user_max_budget': None, 'litellm_api_version': '1.44.22', 'global_max_parallel_requests': None, 'user_api_key_user_id': 'admin', 'user_api_key_org_id': None, 'user_api_key_team_id': '', 'user_api_key_team_alias': '', 'user_api_key_team_max_budget': None, 'user_api_key_team_spend': 53.1909487550011, 'user_api_key_spend': 0.4650396200000007, 'user_api_key_max_budget': None, 'user_api_key_metadata': {}, 'headers': {'host': 'litellm.monitoring:4000', 'accept-encoding': 'gzip, deflate', 'connection': 'keep-alive', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'OpenAI/Python 1.3.9', 'x-stainless-lang': 'python', 'x-stainless-package-version': '1.3.9', 'x-stainless-os': 'Linux', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'CPython', 'x-stainless-runtime-version': '3.10.12', 'x-stainless-async': 'false', 'content-length': '175'}, 'endpoint': 'http://litellm.monitoring:4000/embeddings', 'litellm_parent_otel_span': None, 'requester_ip_address': '', 'model_group': 'text-embedding-3-small', 'deployment': 'azure/text-embedding-3-small', 'model_info': {'id': '', 'db_model': False, 'base_model': 'azure/text-embedding-3-small', 'max_tokens': 8191},'api_base': 'https://xxx.openai.azure.com/', 'caching_groups': None}, model_info={'id': 'xxx', 'db_model': False, 'base_model': 'azure/text-embedding-3-small', 'max_tokens': 8191}, timeout=None, max_retries=0) # note that no ttl values are included here

Initialized litellm callbacks, Async Success Callbacks: ['cache', <bound method Router.deployment_callback_on_success of <litellm.router.Router object at 0x7f78a8683850>>, <function _PROXY_track_cost_callback at 0x7f78a9f11c60>, <litellm.proxy.hooks.parallel_request_limiter._PROXY_MaxParallelRequestsHandler object at 0x7f78a9f269d0>, <litellm.proxy.hooks.max_budget_limiter._PROXY_MaxBudgetLimiter object at 0x7f78a9f26a10>, <litellm.proxy.hooks.cache_control_check._PROXY_CacheControlCheck object at 0x7f78aa0cad10>, <litellm._service_logger.ServiceLogging object at 0x7f78a7ba6dd0>, <litellm.integrations.prometheus.PrometheusLogger object at 0x7f78a784ff90>, <bound method SlackAlerting.response_taking_too_long_callback of <litellm.integrations.slack_alerting.SlackAlerting object at 0x7f78aa1cf610>>]
ASYNC kwargs[caching]: True; litellm.cache: <litellm.caching.Cache object at 0x7f78a9ff8310>; kwargs.get('cache'): None
INSIDE CHECKING CACHE
Checking Cache

Getting Cache key. Kwargs: {} # note that no ttl values are included here

Created cache key: model: text-embedding-3-smallinput: inputencoding_format: base64
Hashed cache key (SHA-256): 98f1de774cc1627f2d926d6847cad7f0446cb2ba7205edf9585568e2abf73f98
Get Async Redis Cache: key: 98f1de774cc1627f2d926d6847cad7f0446cb2ba7205edf9585568e2abf73f98
Got Async Redis Cache: key: 98f1de774cc1627f2d926d6847cad7f0446cb2ba7205edf9585568e2abf73f98, cached_response None

{'model': 'text-embedding-3-small', 'messages': [{'role': 'user', 'content': ""}], 'optional_params': {}
RAW RESPONSE:
{"data": [{"embedding": "", "index": 0, "object": "embedding"}], "model": "text-embedding-3-small", "object": "list", "usage": {"prompt_tokens": 16, "total_tokens": 16}}

Looking up model=azure/text-embedding-3-small in model_cost_map
Success: model=azure/text-embedding-3-small in model_cost_map
prompt_tokens=16; completion_tokens=0
Returned custom cost for model=azure/text-embedding-3-small - prompt_tokens_cost_usd_dollar: 3.2e-07, completion_tokens_cost_usd_dollar: 0.0
Async Wrapper: Completed Call, calling async_success_handler: <bound method Logging.async_success_handler of <litellm.litellm_core_utils.litellm_logging.Logging object at 0x7f78a77bd810>>
Logging Details LiteLLM-Success Call: Cache_hit=None
Looking up model=azure/text-embedding-3-small in model_cost_map
Success: model=azure/text-embedding-3-small in model_cost_map
prompt_tokens=16; completion_tokens=0
Returned custom cost for model=azure/text-embedding-3-small - prompt_tokens_cost_usd_dollar: 3.2e-07, completion_tokens_cost_usd_dollar: 0.0
{"message": "litellm.aembedding(model=azure/text-embedding-3-small)\u001b[32m 200 OK\u001b[0m", "level": "INFO", "timestamp": "2024-10-02T06:22:00.316960"}

{"message": "Async Response: EmbeddingResponse(model='text-embedding-3-small', data=[{'embedding': 'xxx', 'index': 0, 'object': 'embedding'}], object='list', usage=Usage(completion_tokens=0, prompt_tokens=16, total_tokens=16))", "level": "DEBUG", "timestamp": "2024-10-02T06:22:00.317126"}
Getting Cache key. Kwargs: {'model': 'text-embedding-3-small', 'messages': [{'role': 'user', 'content': "input"}], 'optional_params': {}

Created cache key: model: text-embedding-3-smallinput: inputencoding_format: base64
Hashed cache key (SHA-256): 98f1de774cc1627f2d926d6847cad7f0446cb2ba7205edf9585568e2abf73f98
Set Async Redis Cache: key list: [('98f1de774cc1627f2d926d6847cad7f0446cb2ba7205edf9585568e2abf73f98', {'timestamp': 1727850120.3173609, 'response': {'embedding': '', 'index': 0, 'object': 'embedding'}})]
ttl=None, redis_version=7.1.0
Set ASYNC Redis Cache PIPELINE: key: 98f1de774cc1627f2d926d6847cad7f0446cb2ba7205edf9585568e2abf73f98
Value {'timestamp': 1727850120.3173609, 'response': {'embedding': '', 'index': 0, 'object': 'embedding'}}
ttl=None
Logging Details LiteLLM-Async Success Call, cache_hit=None
Looking up model=azure/text-embedding-3-small in model_cost_map
Success: model=azure/text-embedding-3-small in model_cost_map
prompt_tokens=16; completion_tokens=0
Returned custom cost for model=azure/text-embedding-3-small - prompt_tokens_cost_usd_dollar: 3.2e-07, completion_tokens_cost_usd_dollar: 0.0

Twitter / LinkedIn details

No response

BerriAI / litellm