DS4SD / docling

Get your documents ready for gen AI
https://ds4sd.github.io/docling
MIT License
10.48k stars 507 forks source link

feat(ocr): added support for PaddleOCR engine #393

Open Swaymaw opened 2 days ago

Swaymaw commented 2 days ago

This change allows users to seamlessly work with PaddleOCR engine which provides higher accuracy and performance in use-cases which require working with complex PDF files.

Checklist:

mergify[bot] commented 2 days ago

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded. Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/ - [X] `title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?:`
Glider95 commented 13 hours ago

Hello,

Does a RapidOCR implementation could be possible too? (Wrapper of PaddleOCR, a lot easier to install) !

PeterStaar-IBM commented 5 hours ago

Hello,

Does a RapidOCR implementation could be possible too? (Wrapper of PaddleOCR, a lot easier to install) !

What is the added delta with RapidOCR compared to PaddleOCR?

Swaymaw commented 4 hours ago

Hello, Does a RapidOCR implementation could be possible too? (Wrapper of PaddleOCR, a lot easier to install) !

What is the added delta with RapidOCR compared to PaddleOCR?

It is just the poetry.lock file nothing much has changed code-wise.