alephdata / aleph

Search and browse documents and data; find the people and companies you look for.
http://docs.aleph.occrp.org
MIT License
2.04k stars 272 forks source link

FEATURE: Add Persian language support package #2887

Open jlstro opened 1 year ago

jlstro commented 1 year ago

Is your feature request related to a problem? Please describe. Aleph's OCR is not recognizing Persian language documents

Describe the solution you'd like Add the tesseract Persian language pack

Describe alternatives you've considered Arabic kind of works, but clearly is not what the users want

stchris commented 1 year ago

Potential test data: https://www.ifmat.org/02/13/confidential-documents-from-the-khatam-al-anbiya-construction-company/

tillprochaska commented 2 months ago

Components to consider when adding new language support:

Bonus: Document the process for adding language support as we go.