OregonDigital / OD2

Next generation of Oregon Digital ( https://oregondigital.org ) digital collections platform, built on Samvera Hyrax ( https://github.com/samvera/hyrax/ )
18 stars 1 forks source link

Skip saving BBOX if solr/tika returned bad extracted text #3109

Closed CGillen closed 2 months ago

CGillen commented 2 months ago

Bad extracted text return was preventing Tesseract from running on PDFs that needed OCR