UB-Mannheim / zotero-ocr

Zotero Plugin for OCR
GNU Affero General Public License v3.0
552 stars 40 forks source link

Issue with Farsi OCR #49

Closed florisre closed 1 year ago

florisre commented 1 year ago

The OCR actually works great (it's all in the txt file created during OCR), but the text recognized does not get linked into the PDF document properly - as in, most of it is not there in the actual PDF file. Can't even search for it! Let me know what further information you may need to troubleshoot this.

florisre commented 1 year ago

This might actually be an issue with tesseract/PDF readers in general. See this issue: https://github.com/tesseract-ocr/tesseract/issues/2955

This issue will thus have to be handled upstream.