dinosauria123 / gcv2hocr

gcv2hocr converts from Google Cloud Vision OCR output to hocr to make a searchable pdf.
103 stars 31 forks source link

KeyError: 'description' when no description is found #23

Closed goodevilgenius closed 5 years ago

goodevilgenius commented 5 years ago

Occasionally, Cloud Vision will return a textAnnotation that has no description. This can happen when a bullet point is found in the text, e.g.

For example, here is one such annotation:

{
  "boundingPoly": {
    "vertices": [
    {
      "x": 365,
      "y": 749
    },
    {
      "x": 373,
      "y": 749
    },
    {
      "x": 373,
      "y": 780
    },
    {
      "x": 365,
      "y": 780
    }
    ]
  }
}      

When this happens, gcv2hocr fails with the following output:

Traceback (most recent call last):
  File "/home/drj/.local/bin/gcv2hocr.py", line 166, in <module>
    page = fromResponse(resp, **args.__dict__)
  File "/home/drj/.local/bin/gcv2hocr.py", line 105, in fromResponse
    word = GCVAnnotation(ocr_class='ocrx_word', content=anno_json['description'], box=box)
KeyError: 'description'