This PR request seeks to merge my changes of adding prompt caching abilities when running inference on Claude models. The benefit will be reduced cost significantly for inference on BFCL's multi-turn datasets when using the following models (in both Function Calling and Prompting modes):
Claude 3.5 Sonnet
Claude 3 Haiku
Claude 3 Opus
Summary of changes made:
Cached user messages
Cached system prompt (for Prompting mode)
Cached tools (for Function-Calling mode)
Please note:
This implementation rightfully avoids caching in single-turn cases as there aren't any future turns that could avail cache reading benefits.
According to the Anthropic guide, using prompting caching will not affect the model accuracy.
Prompt caching has no effect on output token generation. The response you receive will be identical to what you would get if prompt caching was not used.
This PR request seeks to merge my changes of adding prompt caching abilities when running inference on Claude models. The benefit will be reduced cost significantly for inference on BFCL's multi-turn datasets when using the following models (in both Function Calling and Prompting modes):
Summary of changes made:
Please note: