Experiment with ChatGPT vision API for document extraction

I played with the OpenAI Vision API today for an hour or so. My mind is blown.

I uploaded this scan:

(08)West'sPacificReporter,3rd Series,April_5_2024,pages 38-624

With the following Python code:

import base64
import requests

def encode_image(image_path):
    with open(image_path, "rb") as image_file:
            return base64.b64encode(image_file.read()).decode('utf-8')

image_path = "./(08)West'sPacificReporter,3rd Series,April_5_2024,pages.38-624.png"
base64_image = encode_image(image_path)

headers = {
 "Content-Type": "application/json",
   "Authorization": f"Bearer {api_key}"
   }

payload = {
 "model": "gpt-4-turbo",
 "messages": [
   {
     "role": "user",
     "content": [
       {
         "type": "text",
         "text": "Extract the text from this image. It is a scan of the table of contents from a legal book, but the paper is very thin and the words
on the back of the page bleed through. Assume that it only has ascii characters. After you've extracted the text, convert it into JSON with three key
    ...: s: The name of the case, the court where it happened, and the page in the book it's on."
       },
       {
         "type": "image_url",
         "image_url": {
           "url": f"data:image/jpeg;base64,{base64_image}"
         }
       }
     ]
   }
 ],
 "max_tokens": 3000
}

response = requests.post("https://api.openai.com/v1/chat/completions", headers=headers, json=payload)

That worked amazingly:

print(response.json()['choices'][0]['message']['content'])
The extracted cases and details from the image are:

1. Hernandez v. State – Alaska App. – Page 40
2. Cardenas v. Holmberg in and for County of Pinal – Ariz.App. Div. 2 – Page 108
3. PNC Bank, N.A. v. Coury in and for County of Maricopa – Ariz.App. Div. 1 – Page 88
4. Riggins, In re – Ariz. – Page 64
5. State v. Dayton – Ariz.App. Div. 2 – Page 94
6. Witt, Matter of – Ariz. – Page 75
7. People v. Ruiz – Cal. – Page 38
8. People v. Saldana – Cal. – Page 39
9. People v. S.C. – Cal. – Page 38
10. People v. Wiley – Cal. – Page 38
11. Nicola v. City of Grand Junction – Colo. App. – Page 120
12. People v. Mion – Colo.App. – Page 111
13. State v. Elefante – Hawai‘i – Page 134

The JSON format, according to your request:
```json
[
    {
        "name": "Hernandez v. State",
        "court": "Alaska App.",
        "page": "40"
    },
    {
        "name": "Cardenas v. Holmberg in and for County of Pinal",
        "court": "Ariz.App. Div. 2",
        "page": "108"
    },
    {
        "name": "PNC Bank, N.A. v. Coury in and for County of Maricopa",
        "court": "Ariz.App. Div. 1",
        "page": "88"
    },
    {
        "name": "Riggins, In re",
        "court": "Ariz.",
        "page": "64"
    },
    {
        "name": "State v. Dayton",
        "court": "Ariz.App. Div. 2",
        "page": "94"
    },
    {
        "name": "Witt, Matter of",
        "court": "Ariz.",
        "page": "75"
    },
    {
        "name": "People v. Ruiz",
        "court": "Cal.",
        "page": "38"
    },
    {
        "name": "People v. Saldana",
        "court": "Cal.",
        "page": "39"
    },
    {
        "name": "People v. S.C.",
        "court": "Cal.",
        "page": "38"
    },
    {
        "name": "People v. Wiley",
        "court": "Cal.",
        "page": "38"
    },
    {
        "name": "Nicola v. City of Grand Junction",
        "court": "Colo. App.",
        "page": "120"
    },
    {
        "name": "People v. Mion",
        "court": "Colo.App.",
        "page": "111"
    },
    {
        "name": "State v. Elefante",
        "court": "Hawai‘i",
        "page": "134"
    }
]

As far as I can tell, it's perfect and it even got the okina correct in Hawai‘i. This would have taken weeks of developer time in the past, but now you just ask a computer to do it.

Wow.

A few things I learned:

Using ChatGPT, even the paid version just uses Tesseract and sucks.
To get this going you have to do billing first, then make a new API key, or else it'll say you cannot access the turbo model.
It cost about a penny.
This AI stuff is real, and our theory that we can use this while scanning books is probably correct.

The frontend interface for gemini.google.com returned these results with the same inputs

[
  {
    "name_of_case": "Hernandez v. State",
    "court": "Alaska App.",
    "page": "40"
  },
  {
    "name_of_case": "Cardenas v. Holmberg in and for County of Pinal-Ariz. App. Div. 2",
    "court": "Ariz. App. Div. 2",
    "page": "64"
  },
  {
    "name_of_case": "Riggins, In re-Ariz.",
    "court": "Ariz.",
    "page": "108"
  },
  {
    "name_of_case": "State v. Dayton-Ariz.App. Div. 2",
    "court": "Ariz.App. Div. 2",
    "page": "94"
  },
  {
    "name_of_case": "PNC Bank, N.A. v. Coury in and for County of Maricopa-Ariz.App. Div. 1",
    "court": "Ariz.App. Div. 1",
    "page": "88"
  },
  {
    "name_of_case": "People v. Ruiz-Cal.",
    "court": "Cal.",
    "page": "38"
  },
  {
    "name_of_case": "People v. S.C. Cal.",
    "court": "Cal.",
    "page": "38"
  },
  {
    "name_of_case": "People v. Saldana-Cal....",
    "court": "Cal.",
    "page": "39"
  },
  {
    "name_of_case": "People v. Wiley-Cal.",
    "court": "Cal.",
    "page": "38"
  },
  {
    "name_of_case": "Nicola v. City of Grand Junction Colo. App.",
    "court": "Colo.App.",
    "page": "111"
  },
  {
    "name_of_case": "People v. Mion-Colo.App.",
    "court": "Colo.App.",
    "page": "120"
  },
  {
    "name_of_case": "State v. Elefante-Hawai'i.",
    "court": "Hawai'i.",
    "page": "134"
  }
]

freelawproject / courtlistener

Experiment with ChatGPT vision API for document extraction #3998