dinosauria123 / gcv2hocr

gcv2hocr converts from Google Cloud Vision OCR output to hocr to make a searchable pdf.
99 stars 33 forks source link

Fix negative bbox values (so they do not occur) #18

Closed skylord123 closed 6 years ago

skylord123 commented 6 years ago

I had issues when trying to generate the ocr overlay using hocr-pdf tool from hocr-tools (https://github.com/tmbdev/hocr-tools) because bbox should be an unsigned integer but gcv2hocr.py generate bbox with negative numbers (if the JSON vision request has negative numbers). Using this fix generates the HOCR correctly by following spec (and the OCR invisible text shows up in the correct place so it does not break anything).

Here is the HOCR bbox spec for clarification: http://kba.cloud/hocr-spec/1.2/#bbox

And here is an issue and PR I created for the hocr-tools before I realized this was an issue with this project instead: https://github.com/tmbdev/hocr-tools/issues/127 https://github.com/tmbdev/hocr-tools/pull/128