DS4SD / docling

Get your documents ready for gen AI
https://ds4sd.github.io/docling
MIT License
10.52k stars 509 forks source link

feat(OCR): Introduce the OcrOptions.force_full_page_ocr parameter that forces a full page OCR scanning #290

Closed nikos-livathinos closed 1 week ago

nikos-livathinos commented 1 week ago

In certain occasions the user may want to force a full page OCR and ignore the text contained in a programmatic PDF (see issue #185).

This PR introduces the parameter OcrOptions.force_full_page_ocr that implements this feature.

Please check this example that demonstrates how to force OCR: https://github.com/DS4SD/docling/blob/force_ocr/docs/examples/full_page_ocr.py

Issue resolved by this Pull Request: Resolves #185

Checklist:

PeterStaar-IBM commented 1 week ago

@nikos-livathinos Maybe, let's add a cli parameter to enforce-ocr.