datasette / datasette-enrichments-gpt

Datasette enrichment for analyzing row data using OpenAI's GPT models
Apache License 2.0
18 stars 2 forks source link

GPT Vision #2

Closed simonw closed 9 months ago

simonw commented 10 months ago

Part of:

simonw commented 10 months ago

I tried to see if i could get it to generate alt captions for a bunch of images in a single API request (since I only get 100 a day):

curl -i https://api.openai.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "gpt-4-vision-preview",
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Provide alt text for each of these images, as a JSON array of strings"
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://niche-museums.imgix.net/pioneer-history-1.jpeg?w=400&auto=compress"
                    }
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://niche-museums.imgix.net/pioneer-history-2.jpeg?w=400&auto=compress"
                    }
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://niche-museums.imgix.net/pioneer-history-3.jpeg?w=400&auto=compress"
                    }
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://niche-museums.imgix.net/pioneer-history-4.jpeg?w=400&auto=compress"
                    }
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://niche-museums.imgix.net/pioneer-history-5.jpeg?w=400&auto=compress"
                    }
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://niche-museums.imgix.net/pioneer-history-6.jpeg?w=400&auto=compress"
                    }
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://niche-museums.imgix.net/pioneer-history-7.jpeg?w=400&auto=compress"
                    }
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://niche-museums.imgix.net/pioneer-history-8.jpeg?w=400&auto=compress"
                    }
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://niche-museums.imgix.net/pioneer-history-9.jpeg?w=400&auto=compress"
                    }
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://niche-museums.imgix.net/pioneer-history-10.jpeg?w=400&auto=compress"
                    }
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://niche-museums.imgix.net/pioneer-history-11.jpeg?w=400&auto=compress"
                    }
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://niche-museums.imgix.net/pioneer-history-12.jpeg?w=400&auto=compress"
                    }
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://niche-museums.imgix.net/pioneer-history-13.jpeg?w=400&auto=compress"
                    }
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://niche-museums.imgix.net/pioneer-history-14.jpeg?w=400&auto=compress"
                    }
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://niche-museums.imgix.net/pioneer-history-15.jpeg?w=400&auto=compress"
                    }
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://niche-museums.imgix.net/pioneer-history-16.jpeg?w=400&auto=compress"
                    }
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://niche-museums.imgix.net/pioneer-history-17.jpeg?w=400&auto=compress"
                    }
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://niche-museums.imgix.net/pioneer-history-18.jpeg?w=400&auto=compress"
                    }
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://niche-museums.imgix.net/pioneer-history-19.jpeg?w=400&auto=compress"
                    }
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://niche-museums.imgix.net/pioneer-history-20.jpeg?w=400&auto=compress"
                    }
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://niche-museums.imgix.net/pioneer-history-21.jpeg?w=400&auto=compress"
                    }
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://niche-museums.imgix.net/pioneer-history-22.jpeg?w=400&auto=compress"
                    }
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://niche-museums.imgix.net/pioneer-history-23.jpeg?w=400&auto=compress"
                    }
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://niche-museums.imgix.net/pioneer-history-24.jpeg?w=400&auto=compress"
                    }
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://niche-museums.imgix.net/pioneer-history-25.jpeg?w=400&auto=compress"
                    }
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://niche-museums.imgix.net/pioneer-history-26.jpeg?w=400&auto=compress"
                    }
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://niche-museums.imgix.net/pioneer-history-27.jpeg?w=400&auto=compress"
                    }
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://niche-museums.imgix.net/pioneer-history-28.jpeg?w=400&auto=compress"
                    }
                }
            ]
        }
    ]
  }' | tee /tmp/captions.json

I first tried variants of this that used tools:

"tools": [
        {
            "type": "function",
            "function": {
                "name": "set_image_captions",
                "description": "Set captions for all of the images",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "captions": {
                            "type": "array",
                            "items": {
                                "type": "string"
                            },
                            "minItems": 28,
                            "maxItems": 28,
                            "description": "An array of captions for each of the images"
                        }
                    },
                    "required": [
                        "captions"
                    ]
                }
            }
        }
    ]
}

And one that used this:

"response_format": { "type": "json_object" },

Both of these failed with a validation error about extra fields (so presumably are not supported by that model), but both errors burned one of my 100/day requests too!

The thing that DID work still didn't work - it burned a request and gave me this response:

{
    "id": "chatcmpl-8LvfXOHZlp9dZfhnVzFSFO5Qo7B53",
    "object": "chat.completion",
    "created": 1700237259,
    "model": "gpt-4-1106-vision-preview",
    "usage": {
        "prompt_tokens": 7672,
        "completion_tokens": 12,
        "total_tokens": 7684
    },
    "choices": [
        {
            "message": {
                "role": "assistant",
                "content": "I'm sorry, I cannot provide assistance with that request."
            },
            "finish_details": {
                "type": "stop",
                "stop": "<|fim_suffix|>"
            },
            "index": 0
        }
    ]
}

I think it cost me 7.71c to get that error.

simonw commented 10 months ago

Running against a single image worked:

curl -i https://api.openai.com/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -d '{
    "model": "gpt-4-vision-preview",
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Provide alt text for each of these images, as a JSON array of strings"
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://niche-museums.imgix.net/pioneer-history-1.jpeg?w=400&auto=compress"
                    }
                }
            ]}]}'

I got back:

{
    "id": "chatcmpl-8Lvjd5rP3193UKvDQWKnaXmRtLLuJ",
    "object": "chat.completion",
    "created": 1700237513,
    "model": "gpt-4-1106-vision-preview",
    "usage": {
        "prompt_tokens": 277,
        "completion_tokens": 16,
        "total_tokens": 293
    },
    "choices": [
        {
            "message": {
                "role": "assistant",
                "content": "```json\n[\n  \"A model of a complex of buildings with intricate detailing"
            },
            "finish_details": {
                "type": "max_tokens"
            },
            "index": 0
        }
    ]
}

And headers:

date: Fri, 17 Nov 2023 16:11:53 GMT
content-type: application/json
content-length: 409
openai-model: gpt-4-1106-vision-preview
openai-organization: user-r3e61fpak04cbaokp5buoae4
openai-processing-ms: 8364
openai-version: 2020-10-01
x-ratelimit-limit-requests: 100
x-ratelimit-limit-tokens: 40000
x-ratelimit-remaining-requests: 95
x-ratelimit-remaining-tokens: 39965
x-ratelimit-reset-requests: 1h4m25.463s
x-ratelimit-reset-tokens: 52ms

A model of a complex of buildings with intricate detailing

For this image:

simonw commented 10 months ago

Code from my prototype:


    async def gpt4_vision(self, api_key, prompt, image_url, system=None) -> str:
        messages = []
        if system:
            # TODO: Check that gpt-4-vision-preview supports system prompts
            messages.append({"role": "system", "content": system})
        messages.append({
            "role": "user",
            "content": [
                {"type": "text", "text": prompt},
                {"type": "image_url", "image_url": {"url": image_url}}
            ]
        })
        return self._chat_completion(api_key, "gpt-4-vision-preview", messages)
simonw commented 9 months ago

They've increased the daily rate limit on vision quite a bit. https://twitter.com/owencm/status/1730356831476998399 - 1500/day on tier 3.

simonw commented 9 months ago

OK weird, it looks like the default max tokens is less than 100.

This photo showcases a room with an extensive collection of cultural artifacts and artworks. On

https://platform.openai.com/docs/guides/vision says:

we currently set a low max_tokens default which you can override.

simonw commented 9 months ago

Wow, I bumped up to 1000 tokens and got a LOT of text:

This image shows a collection of various art pieces and artifacts on display within an indoor setting. The collection has an ethnographic or cultural theme, with a particular emphasis on African art.

On the left wall, there are several items hung up, including what appears to be a skull of a horned animal, underneath which is a sculpture of a slender figure and a smaller, oblong-shaped item which could be a shield or decorative piece. A black mantle clock sits on a white shelf below these items, alongside a rustic-looking wooden box.

In the center of the image, you can see three large, vertically-oriented masks or sculptures with intricate designs and features that are characteristic of traditional African art. These are suspended in front of a canvas painting depicting a vibrant landscape with mountains in the background, possibly indicating the cultural origin of the artworks.

To the right, there are two wood-carved figures which look like masks or totems with elongated faces and detailed carvings. Below these figures are more sculptures of various shapes and sizes, one resembling an animal and another simulating a face.

The room has a neutral-colored wall which sets a calm background for the richly detailed and variously colored art pieces. The composition of artifacts suggests a collector or an enthusiast's appreciation for cultural pieces, possibly from African origins or inspired by African styles.

Prompt was "Describe this photo" - image was:

simonw commented 9 months ago

UI looks like this:

vision