ftnext / ocroy

https://pypi.org/project/ocroy/
MIT License
0 stars 0 forks source link

GPT-4Vで画像に書かれたテキストを読み出す方法を知っている #1

Closed ftnext closed 11 months ago

ftnext commented 11 months ago

https://platform.openai.com/docs/guides/vision

ftnext commented 11 months ago
import base64
from openai import OpenAI

client = OpenAI()

with open("kanji.png", "rb") as fb:
    base64_image = base64.b64encode(fb.read()).decode()

response = client.chat.completions.create(
    model="gpt-4-vision-preview",
    temperature=0,
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "画像の中に日本語でなんと書いてありますか?"},
                {
                    "type": "image_url",
                    "image_url": f"data:image/png;base64,{base64_image}",
                },
            ],
        }
    ],
    max_tokens=1000,
)

# response.choices[0]