Open databill86 opened 1 month ago
Oh! this isn't necessarily a bug but our caching logic != openai's prompt caching
You cached token field
cached_tokens=1024
Is openai's response, not from the proxy @databill86
Oh, I see! That clears up some of the confusion.
However, I was specifically referring to OpenAI's prompt caching behavior as outlined in their documentation. Do you plan on supporting something more aligned with their caching mechanism in future releases?
What happened?
When using the LiteLLM OpenAI proxy, I've noticed that the caching functionality is not working as expected. Specifically:
cache_hit
value is always0
, even when cached tokens are being used in the API response.Code to Reproduce
Here's a minimal example that demonstrates the issue:
The cached tokens count in the UI is also set to 0 and does not change, even after multiple requests that are hitting the cache.
litellm version:
image: ghcr.io/berriai/litellm:main-v1.49.3
Relevant log output
Twitter / LinkedIn details
No response