Closed ishaan-jaff closed 2 weeks ago
Aiming to have this be compatible with Anthropic prompt caching
Looks like you can specify the id of the cached object - we could have this follow something like our _get_cache_key logic, generate a unique hash for the cached object -> store it -> check if cached object exists
Looks like there might be size requirements on what gets cached
Unlike Anthropic, there is a minimum input token count for what can be cached -
Unlike Anthropic, there is a minimum input token count for what can be cached -
Anthropic also has a min input token requirement for caching btw
got it
(base) krrishdholakia@Krrishs-MacBook-Air temp_py_folder % curl -X POST "https://generativelanguage.googleapis.com/v1beta/ cachedContents?key=$GEMINI_API_KEY" \ -H 'Content-Type: application/json' \ -d @request.json \
cache.json % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 1104k 0 307 100 1104k 179 646k 0:00:01 0:00:01 --:--:-- 648k (base) krrishdholakia@Krrishs-MacBook-Air temp_py_folder % cat cache.json { "name": "cachedContents/4d2kd477o3pg", "model": "models/gemini-1.5-flash-001", "createTime": "2024-08-26T22:31:16.147190Z", "updateTime": "2024-08-26T22:31:16.147190Z", "expireTime": "2024-08-26T22:36:15.548934784Z", "displayName": "", "usageMetadata": { "totalTokenCount": 323383 } }
Looks like the cache key is returned as part of the response object.
when you do curl -X GET
on the cache name, you just get back the cache response object
curl "https://generativelanguage.googleapis.com/v1beta/cachedCo
ntents/4d2kd477o3pg?key=GEMINI_API_KEY"
{ "name": "cachedContents/4d2kd477o3pg", "model": "models/gemini-1.5-flash-001", "createTime": "2024-08-26T22:31:16.147190Z", "updateTime": "2024-08-26T22:31:16.147190Z", "expireTime": "2024-08-26T22:36:15.548934784Z", "displayName": "", "usageMetadata": { "totalTokenCount": 323383 } }
This is probably a good way to check if a cached key exists on Google's side, if not -> create it -> run request
Looks like Gemini just allows 1 cached content to be part of a request message. So we'll probably need to add a check to the input message, for multiple cached messages?
curl -X POST "https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash-001:generateContent?key=$GOOGLE_API_KEY" \
-H 'Content-Type: application/json' \
-d '{
"contents": [
{
"parts":[{
"text": "Please summarize this transcript"
}],
"role": "user"
},
],
"cachedContent": "'$CACHE_NAME'"
}'
I guess for v0, just support 1 message to be cached.
Future improvement: support a block of continuous messages to be cached. (vertex allows passing this)
The Feature
-
Motivation, pitch
-
Twitter / LinkedIn details
No response