BerriAI / litellm

Python SDK, Proxy Server (LLM Gateway) to call 100+ LLM APIs in OpenAI format - [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, Replicate, Groq]
https://docs.litellm.ai/docs/
Other
12.72k stars 1.48k forks source link

[Bug]: S3 cache doesn`t work well with time to live #3852

Open pharindoko opened 4 months ago

pharindoko commented 4 months ago

What happened?

After trying out string, float etc., which failed:

litellm.cache = Cache(type="s3", s3_bucket_name=cache_bucket_name, s3_region_name="eu-central-1", ttl=600)

I used following script

litellm.cache = Cache(type="s3", s3_bucket_name=cache_bucket_name, s3_region_name="eu-central-1", ttl=datetime.timedelta(minutes=1))

This will be accepted and writes the metadata: image

But this won`t delete the file in S3. And even if the file is kept in s3, litellm wont check inget_cache` method in the code that this is already expired.

Relevant log output

No response

Twitter / LinkedIn details

No response

Manouchehri commented 4 months ago

This is a really good observation.

For cache matching, I think we could do either or both:

  1. Use If-Modified-Since to enforce the current ttl. This would not obey what the old ttl was, only the current one.
  2. Check that the Expires header has not past yet.

What do you think should happen?

krrishdholakia commented 4 months ago

@pharindoko i wouldn't recommend s3 caching for production. It uses boto3 which is sync.

For prod - we recommend using the redis cache - https://docs.litellm.ai/docs/proxy/prod

Any reason you need to use s3 here? @pharindoko

pharindoko commented 4 months ago

@pharindoko i wouldn't recommend s3 caching for production. It uses boto3 which is sync.

For prod - we recommend using the redis cache - https://docs.litellm.ai/docs/proxy/prod

Any reason you need to use s3 here? @pharindoko

Yes. I'm on AWS. S3 is easier to create and to destroy and pay as you go. Redis is nice but comes with additional costs on AWS. I'm not seaching for the last 35 ms and s3 work quite good so far.

pharindoko commented 4 months ago

@Manouchehri A check for 'Expires' would be nice and I assume easy to implement.

pharindoko commented 4 months ago

Hmm you're right with the sync calls to S3.

Would it be ok for you to make them asynchronous using asyncio like described in this article ?

https://medium.com/@s.zeort/asynchronous-aws-s3-client-in-python-4f6b33829da6

This way no additional library needs to be added.

krrishdholakia commented 4 months ago

we have an async implementation here for aws here - https://github.com/BerriAI/litellm/blob/67da24f1444e292553506ed2a581a4571d7ca949/litellm/llms/bedrock_httpx.py#L880

I can try and do it for s3 too.

Can we do a quick call though? Want to make sure i understand the use-case

I can meet later today as well - https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat

@pharindoko @ishaan-jaff

pharindoko commented 4 months ago

Sorry I`m in CET timezone and don`t have that much time at the moment. there`s two separate topics - the expiration and the sync calls.

If you would fix either the one or the other would be great. If not ok then I have to accept it and if I`m annoyed by it I still can create my own cache plugin right ? :)

krrishdholakia commented 4 months ago

sure - i'll prioritize the async implementation and then implement the ttl

@pharindoko can we setup a support channel to make sure the fixes work as required?

Discord: https://discord.com/invite/wuPM9dRgDw

Manouchehri commented 4 months ago

we have an async implementation here for aws here -

@krrishdholakia See #3860 for tracking the async S3 issue. :)

krrishdholakia commented 4 months ago

thanks @Manouchehri

Manouchehri commented 4 months ago

sure - i'll prioritize the async implementation and then implement the ttl

Do a presigned URL with boto3 in https://github.com/BerriAI/litellm/issues/3860#issuecomment-2134155490, and then add a If-Modified-Since header when fetching that URL to enforce a TTL. =) IMO the easiest way, and it allows the user to change the TTL whenever they want without needing to modify the object.