Closed simonw closed 11 months ago
For the moment it will pick up the OpenAI key from a plugin secret.
I'm going to use httpx
directly against their API for this, because they recently broke the OpenAI Python library and I don't want to have to cover that if LLM is still on the old version.
curl https://api.openai.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "gpt-4-vision-preview",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What’s in this image?"
},
{
"type": "image_url",
"image_url": {
"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
}
}
]
}
],
"max_tokens": 300
}'
And:
curl https://api.openai.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
"model": "gpt-3.5-turbo",
"messages": [
{
"role": "user",
"content": "What is the weather like in Boston?"
}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The city and state, e.g. San Francisco, CA"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"]
}
},
"required": ["location"]
}
}
}
],
"tool_choice": "auto"
}'
To force tool choice of a specific function:
{"tool_choice": {"type": "function", "function": {"name": "my_function"}}}
No need to use streaming here as enrichments are not human-interactive while they are running, which is good as it means we get the usage
block saying how many tokens we spent.
Here's what that first example returns BTW:
{
"id": "chatcmpl-8LkBTAG7j7E4WypRCz9PztE18gHF3",
"object": "chat.completion",
"created": 1700193111,
"model": "gpt-4-1106-vision-preview",
"usage": {
"prompt_tokens": 1118,
"completion_tokens": 109,
"total_tokens": 1227
},
"choices": [
{
"message": {
"role": "assistant",
"content": "This image shows a wooden boardwalk stretching through a lush green meadow with tall grasses on either side. The sky above is partly cloudy with blue patches, indicative of fair weather. There are scattered trees and shrubs in the distance, which suggest that the setting might be in a nature reserve, wetland, or a similar natural environment. The perspective of the boardwalk draws the viewer's eye towards the horizon, adding depth to the landscape. The warm lighting implies that the photo could have been taken during the late afternoon or early evening."
},
"finish_details": {
"type": "stop",
"stop": "<|fim_suffix|>"
},
"index": 0
}
]
}
This image shows a wooden boardwalk stretching through a lush green meadow with tall grasses on either side. The sky above is partly cloudy with blue patches, indicative of fair weather. There are scattered trees and shrubs in the distance, which suggest that the setting might be in a nature reserve, wetland, or a similar natural environment. The perspective of the boardwalk draws the viewer's eye towards the horizon, adding depth to the landscape. The warm lighting implies that the photo could have been taken during the late afternoon or early evening.
For this image:
Pricing:
So 1118 prompt and 109 completion = $0.01445 - 1.5 cents.
I built a tool for that: https://chat.openai.com/g/g-jpbGrxLNf-gpt-pricing-calculator
I'm going to take advantage of the ability to have a custom template for the enrichment configuration page:
Then I'll have a way of switching between the three modes: text prompt, text and image prompt or structured JSON prompt.
It should support 3.5 as well since that's 10-15x cheaper:
I need to think about rate limits, which are quite complicated as they vary depending on how old your account is etc.
https://platform.openai.com/docs/guides/rate-limits/rate-limits-in-headers?context=tier-free
They return rate limit information in HTTP headers, so a smart implementation would use those to adjust the speed at which we are hitting them:
Batching would be nice, but it looks like that's only available for the old completion API at the moment.
https://platform.openai.com/docs/guides/rate-limits/batching-requests says:
If you're hitting the limit on requests per minute, but have available capacity on tokens per minute, you can increase your throughput by batching multiple tasks into each request. This will allow you to process more tokens per minute, especially with our smaller models.
So I don't think it saves money, just reduces your rate of requests (but not tokens) per minute.
I'd actually like to not have to pay for the instruction / system prompt more than once. Ideally I'd be able to send a single system prompt and multiple data prompts in the same request - that's not supported by their API, but might be possible using prompt engineering. I'm worried about the impact different rows in the same batch may have on each other though.
Here's some complex example code that adjusts the rate based on rules you give it, but does not seem to take the headers into account: https://github.com/openai/openai-cookbook/blob/feef1bf3982e15ad180e17732525ddbadaf2b670/examples/api_request_parallel_processor.py
I ran the earlier 3.5 curl
with -i
to see the headers:
x-ratelimit-limit-requests: 5000
x-ratelimit-limit-tokens: 160000
x-ratelimit-limit-tokens_usage_based: 160000
x-ratelimit-remaining-requests: 4999
x-ratelimit-remaining-tokens: 159974
x-ratelimit-remaining-tokens_usage_based: 159974
x-ratelimit-reset-requests: 12ms
x-ratelimit-reset-tokens: 9ms
x-ratelimit-reset-tokens_usage_based: 9ms
So for 3.5 it looks like my rate limits reset every 10ms so are effectively unlimited.
Rate limits for vision are MUCH tighter:
openai-model: gpt-4-1106-vision-preview
openai-organization: user-r3e61fpak04cbaokp5buoae4
openai-processing-ms: 5627
openai-version: 2020-10-01
x-ratelimit-limit-requests: 100
x-ratelimit-limit-tokens: 40000
x-ratelimit-remaining-requests: 99
x-ratelimit-remaining-tokens: 39693
x-ratelimit-reset-requests: 14m24s
x-ratelimit-reset-tokens: 460ms
Ran that again a few seconds later:
x-ratelimit-limit-requests: 100
x-ratelimit-limit-tokens: 40000
x-ratelimit-remaining-requests: 98
x-ratelimit-remaining-tokens: 39693
x-ratelimit-reset-requests: 28m3.723s
x-ratelimit-reset-tokens: 460ms
Weird that the x-ratelimit-reset-requests
went up from 14m to 28m.
Looks like I get 100 requests per 15 minutes.
A third request:
x-ratelimit-limit-requests: 100
x-ratelimit-limit-tokens: 40000
x-ratelimit-remaining-requests: 97
x-ratelimit-remaining-tokens: 39693
x-ratelimit-reset-requests: 41m22.419s
x-ratelimit-reset-tokens: 460ms
Now it's up to 41m for the reset?
https://platform.openai.com/account/limits shows my own rate limits. It looks like vision currently has a hard 100/day limit for almost all users.
I'm currently in tier 3:
Tier 3 limits:
Tier 4:
https://platform.openai.com/docs/guides/rate-limits/usage-tiers
Upgrading to tier 4 still limits to 100 images a day.
So regular (expensive) GPT-4 is currently 5,000 requests/minute on tier 3, but GPT 4 Turbo is only 500. I wonder how much that will change when gpt-4-1106-preview
becomes the new default, which I think is soon.
For most models I can make a cheap API request to figure out the available rate limits before the user kicks off the process, to help them understand if they have enough capacity or not.
But not for GPT vision, because that one (currently) only allows 100 requests a day and I don't want to burn one just to check the rate limit.
I asked about that on the forum here: https://community.openai.com/t/possible-to-check-api-rate-limit-headers-without-burning-a-request/510041
What I could do instead is store a _enrichments_gpt_rate_limits
table that's updated (for each organization / model pair) after each request to store the latest limits:
openai-model: gpt-3.5-turbo-0613
openai-organization: user-r3e61fpak04cbaokp5buoae4
openai-processing-ms: 865
openai-version: 2020-10-01
strict-transport-security: max-age=15724800; includeSubDomains
x-ratelimit-limit-requests: 5000
x-ratelimit-limit-tokens: 160000
x-ratelimit-limit-tokens_usage_based: 160000
x-ratelimit-remaining-requests: 4999
x-ratelimit-remaining-tokens: 159974
x-ratelimit-remaining-tokens_usage_based: 159974
x-ratelimit-reset-requests: 12ms
x-ratelimit-reset-tokens: 9ms
x-ratelimit-reset-tokens_usage_based: 9ms
I worry a bit about the extra write traffic from all of those updates, but I imagine it will be fine - if it causes problems I can stop doing it.
For the first release I'll do something simple: watch the remaining tokens and cancel the run if they run out.
Refs:
This plugin will let you run a prompt through GPT-4 in a similar way to datasette-enrichments-jinja - you'll be able to design a prompt using a template (that can include all of the columns in the current table) and specify the output column for it.
Potential bonus features:
To control costs I'd like to be able to set an optional budget too, which is tracked as the enrichment runs and allows it to terminate early if the budget is exceeded.