SuffolkLITLab / FormFyxer

A tool for learning about and pre-processing forms
MIT License
11 stars 1 forks source link

If the PDF has no text, OCR it #112

Closed nonprofittechy closed 1 year ago

nonprofittechy commented 1 year ago

Looks like this was an oversight originally. We already invoke OCRMyPDF if we get garbage readability results or if OpenAI fails. This adds an OCR pass if there is no text layer in the PDF.

See: https://github.com/SuffolkLITLab/RateMyPDF/issues/6