Closed tuanardouin closed 3 weeks ago
Anthropic just announced a new feature "Prompt caching". It lowers cost and reduces latency, particularly for large context.
Extract from the article :
Prompt caching, which enables developers to cache frequently used context between API calls, is now available on the Anthropic API. With prompt caching, customers can provide Claude with more background knowledge and example outputs—all while reducing costs by up to 90% and latency by up to 85% for long prompts. Prompt caching is available today in public beta for Claude 3.5 Sonnet and Claude 3 Haiku, with support for Claude 3 Opus coming soon.
Apparently, you just have to add
"cache_control": {"type": "ephemeral"}
to the system to benefit from it. And it only works withanthropic-beta: prompt-caching-2024-07-31
in the header.I'm not familiar with the code base, but if someone can give me some indications and point me in the right direction, I could add it.
yes please I am spending a fortune and possibly this will help us all save a lot of money!
Don't forget the ~5 minute cache timeout when implementing this. It might make sense to consider performing an otherwise meaningless "refresh" API call in some instances rather than pushing the entire context on a subsequent call. Something to consider.
@carlanwray Good point, I didn't see that. So it's becoming a bit more complex than I thought.
What is the cache lifetime?
The cache has a lifetime (TTL) of about 5 minutes. This lifetime is refreshed each time the cached content is used.
I don't understand what happens after the 5 minutes. We're going to get an error ? The caching price will not be applied because it needs to be cached again?
How does Prompt Caching affect pricing?
Prompt Caching introduces a new pricing structure where cache writes cost 25% more than base input tokens, while cache hits cost only 10% of the base input token price.
I didn't see that. So the first call to the API will be more expensive and it's the subsequent one that are?
I need to make some tests before starting anything.
@Doriandarko are you still maintaining this and accepting PRs?
Yeah I was working on this!
Grazie Pietro!
This is done
Anthropic just announced a new feature "Prompt caching". It lowers cost and reduces latency, particularly for large context.
Extract from the article :
Apparently, you just have to add
"cache_control": {"type": "ephemeral"}
to the system to benefit from it. And it only works withanthropic-beta: prompt-caching-2024-07-31
in the header.I'm not familiar with the code base, but if someone can give me some indications and point me in the right direction, I could add it.