datasette / datasette-enrichments-gpt

Datasette enrichment for analyzing row data using OpenAI's GPT models
Apache License 2.0
19 stars 3 forks source link

Plugin design #1

Closed simonw closed 11 months ago

simonw commented 11 months ago

Datasette enrichment for analyzing row data using OpenAI's GPT models

This plugin will let you run a prompt through GPT-4 in a similar way to datasette-enrichments-jinja - you'll be able to design a prompt using a template (that can include all of the columns in the current table) and specify the output column for it.

Potential bonus features:

To control costs I'd like to be able to set an optional budget too, which is tracked as the enrichment runs and allows it to terminate early if the budget is exceeded.

simonw commented 11 months ago

For the moment it will pick up the OpenAI key from a plugin secret.

simonw commented 11 months ago

I'm going to use httpx directly against their API for this, because they recently broke the OpenAI Python library and I don't want to have to cover that if LLM is still on the old version.

curl https://api.openai.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "gpt-4-vision-preview",
    "messages": [
      {
        "role": "user",
        "content": [
          {
            "type": "text",
            "text": "What’s in this image?"
          },
          {
            "type": "image_url",
            "image_url": {
              "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
            }
          }
        ]
      }
    ],
    "max_tokens": 300
  }'

And:

curl https://api.openai.com/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-d '{
  "model": "gpt-3.5-turbo",
  "messages": [
    {
      "role": "user",
      "content": "What is the weather like in Boston?"
    }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_current_weather",
        "description": "Get the current weather in a given location",
        "parameters": {
          "type": "object",
          "properties": {
            "location": {
              "type": "string",
              "description": "The city and state, e.g. San Francisco, CA"
            },
            "unit": {
              "type": "string",
              "enum": ["celsius", "fahrenheit"]
            }
          },
          "required": ["location"]
        }
      }
    }
  ],
  "tool_choice": "auto"
}'

To force tool choice of a specific function:

{"tool_choice": {"type": "function", "function": {"name": "my_function"}}}

No need to use streaming here as enrichments are not human-interactive while they are running, which is good as it means we get the usage block saying how many tokens we spent.

simonw commented 11 months ago

Here's what that first example returns BTW:

{
    "id": "chatcmpl-8LkBTAG7j7E4WypRCz9PztE18gHF3",
    "object": "chat.completion",
    "created": 1700193111,
    "model": "gpt-4-1106-vision-preview",
    "usage": {
        "prompt_tokens": 1118,
        "completion_tokens": 109,
        "total_tokens": 1227
    },
    "choices": [
        {
            "message": {
                "role": "assistant",
                "content": "This image shows a wooden boardwalk stretching through a lush green meadow with tall grasses on either side. The sky above is partly cloudy with blue patches, indicative of fair weather. There are scattered trees and shrubs in the distance, which suggest that the setting might be in a nature reserve, wetland, or a similar natural environment. The perspective of the boardwalk draws the viewer's eye towards the horizon, adding depth to the landscape. The warm lighting implies that the photo could have been taken during the late afternoon or early evening."
            },
            "finish_details": {
                "type": "stop",
                "stop": "<|fim_suffix|>"
            },
            "index": 0
        }
    ]
}

This image shows a wooden boardwalk stretching through a lush green meadow with tall grasses on either side. The sky above is partly cloudy with blue patches, indicative of fair weather. There are scattered trees and shrubs in the distance, which suggest that the setting might be in a nature reserve, wetland, or a similar natural environment. The perspective of the boardwalk draws the viewer's eye towards the horizon, adding depth to the landscape. The warm lighting implies that the photo could have been taken during the late afternoon or early evening.

For this image:

Pricing:

CleanShot 2023-11-16 at 20 01 03@2x

So 1118 prompt and 109 completion = $0.01445 - 1.5 cents.

I built a tool for that: https://chat.openai.com/g/g-jpbGrxLNf-gpt-pricing-calculator

simonw commented 11 months ago

I'm going to take advantage of the ability to have a custom template for the enrichment configuration page:

Then I'll have a way of switching between the three modes: text prompt, text and image prompt or structured JSON prompt.

simonw commented 11 months ago

It should support 3.5 as well since that's 10-15x cheaper:

CleanShot 2023-11-16 at 20 07 39@2x CleanShot 2023-11-16 at 20 08 01@2x
simonw commented 11 months ago

I need to think about rate limits, which are quite complicated as they vary depending on how old your account is etc.

https://platform.openai.com/docs/guides/rate-limits/rate-limits-in-headers?context=tier-free

They return rate limit information in HTTP headers, so a smart implementation would use those to adjust the speed at which we are hitting them:

CleanShot 2023-11-16 at 20 41 56@2x
simonw commented 11 months ago

Batching would be nice, but it looks like that's only available for the old completion API at the moment.

https://platform.openai.com/docs/guides/rate-limits/batching-requests says:

If you're hitting the limit on requests per minute, but have available capacity on tokens per minute, you can increase your throughput by batching multiple tasks into each request. This will allow you to process more tokens per minute, especially with our smaller models.

So I don't think it saves money, just reduces your rate of requests (but not tokens) per minute.

I'd actually like to not have to pay for the instruction / system prompt more than once. Ideally I'd be able to send a single system prompt and multiple data prompts in the same request - that's not supported by their API, but might be possible using prompt engineering. I'm worried about the impact different rows in the same batch may have on each other though.

Here's some complex example code that adjusts the rate based on rules you give it, but does not seem to take the headers into account: https://github.com/openai/openai-cookbook/blob/feef1bf3982e15ad180e17732525ddbadaf2b670/examples/api_request_parallel_processor.py

simonw commented 11 months ago

I ran the earlier 3.5 curl with -i to see the headers:

x-ratelimit-limit-requests: 5000
x-ratelimit-limit-tokens: 160000
x-ratelimit-limit-tokens_usage_based: 160000
x-ratelimit-remaining-requests: 4999
x-ratelimit-remaining-tokens: 159974
x-ratelimit-remaining-tokens_usage_based: 159974
x-ratelimit-reset-requests: 12ms
x-ratelimit-reset-tokens: 9ms
x-ratelimit-reset-tokens_usage_based: 9ms
simonw commented 11 months ago

So for 3.5 it looks like my rate limits reset every 10ms so are effectively unlimited.

simonw commented 11 months ago

Rate limits for vision are MUCH tighter:

openai-model: gpt-4-1106-vision-preview
openai-organization: user-r3e61fpak04cbaokp5buoae4
openai-processing-ms: 5627
openai-version: 2020-10-01
x-ratelimit-limit-requests: 100
x-ratelimit-limit-tokens: 40000
x-ratelimit-remaining-requests: 99
x-ratelimit-remaining-tokens: 39693
x-ratelimit-reset-requests: 14m24s
x-ratelimit-reset-tokens: 460ms

Ran that again a few seconds later:

x-ratelimit-limit-requests: 100
x-ratelimit-limit-tokens: 40000
x-ratelimit-remaining-requests: 98
x-ratelimit-remaining-tokens: 39693
x-ratelimit-reset-requests: 28m3.723s
x-ratelimit-reset-tokens: 460ms

Weird that the x-ratelimit-reset-requests went up from 14m to 28m.

Looks like I get 100 requests per 15 minutes.

A third request:

x-ratelimit-limit-requests: 100
x-ratelimit-limit-tokens: 40000
x-ratelimit-remaining-requests: 97
x-ratelimit-remaining-tokens: 39693
x-ratelimit-reset-requests: 41m22.419s
x-ratelimit-reset-tokens: 460ms

Now it's up to 41m for the reset?

simonw commented 11 months ago

https://platform.openai.com/account/limits shows my own rate limits. It looks like vision currently has a hard 100/day limit for almost all users.

CleanShot 2023-11-16 at 21 27 14@2x

I'm currently in tier 3:

CleanShot 2023-11-16 at 21 27 30@2x

Tier 3 limits:

CleanShot 2023-11-16 at 21 28 28@2x

Tier 4:

CleanShot 2023-11-16 at 21 28 53@2x

https://platform.openai.com/docs/guides/rate-limits/usage-tiers

Upgrading to tier 4 still limits to 100 images a day.

simonw commented 11 months ago

So regular (expensive) GPT-4 is currently 5,000 requests/minute on tier 3, but GPT 4 Turbo is only 500. I wonder how much that will change when gpt-4-1106-preview becomes the new default, which I think is soon.

simonw commented 11 months ago

For most models I can make a cheap API request to figure out the available rate limits before the user kicks off the process, to help them understand if they have enough capacity or not.

But not for GPT vision, because that one (currently) only allows 100 requests a day and I don't want to burn one just to check the rate limit.

I asked about that on the forum here: https://community.openai.com/t/possible-to-check-api-rate-limit-headers-without-burning-a-request/510041

What I could do instead is store a _enrichments_gpt_rate_limits table that's updated (for each organization / model pair) after each request to store the latest limits:

openai-model: gpt-3.5-turbo-0613
openai-organization: user-r3e61fpak04cbaokp5buoae4
openai-processing-ms: 865
openai-version: 2020-10-01
strict-transport-security: max-age=15724800; includeSubDomains
x-ratelimit-limit-requests: 5000
x-ratelimit-limit-tokens: 160000
x-ratelimit-limit-tokens_usage_based: 160000
x-ratelimit-remaining-requests: 4999
x-ratelimit-remaining-tokens: 159974
x-ratelimit-remaining-tokens_usage_based: 159974
x-ratelimit-reset-requests: 12ms
x-ratelimit-reset-tokens: 9ms
x-ratelimit-reset-tokens_usage_based: 9ms
simonw commented 11 months ago

I worry a bit about the extra write traffic from all of those updates, but I imagine it will be fine - if it causes problems I can stop doing it.

simonw commented 11 months ago

For the first release I'll do something simple: watch the remaining tokens and cancel the run if they run out.

Refs: