UB-Mannheim / zotero-ocr

Zotero Plugin for OCR
GNU Affero General Public License v3.0
551 stars 40 forks source link

Tesseract.exe failed #85

Closed thegreekgeek closed 4 days ago

thegreekgeek commented 2 weeks ago

Hey all, I'm trying to get this running and it's... not going well. I've installed tesseract-ocr, and have pdftoppm downloaded and the plugin settings are pointing to both executables. When I try to run the plugin however, I get this error message in the console:

image

After I enabled logging, I ended up finding this error:

(1)(+0001256): Error: C:\Users\Rob\AppData\Local\Programs\Tesseract-OCR\tesseract.exe failed Error: C:\Users\Rob\AppData\Local\Programs\Tesseract-OCR\tesseract.exe failed observe@chrome://zotero/content/xpcom/utilities_internal.js:608:27 From previous event: recognize@jar:file:///C:/Users/Rob/AppData/Roaming/Zotero/Zotero/Profiles/o7b28zip.default/extensions/zotero-ocr@bib.uni-mannheim.de.xpi!/zotero-ocr.js:241:49

Does anyone have any insight?

aborel commented 2 weeks ago

Can you run C:\Users\Rob\AppData\Local\Programs\Tesseract-OCR\tesseract.exe in a terminal window?

If yes, can you post a screenshot of your Zotero-OCR preferences?

thegreekgeek commented 5 days ago

Yeah, invoking tesseract.exe via powershell gives me the help options, and I was able to successfully OCR an image as well. I just added the tesseract-ocr folder to my PATH, but that didn't do anything. Still getting the error in utilities_internal.js:608:27 and zotero-ocr.js:241:49

Here's my settings for Zotero-OCR: image

thegreekgeek commented 4 days ago

Alright! Managed to narrow it down, looks like the set language should be "eng" instead of "English". Now the pdf-to-png conversion seems to be crapping out halfway through page 6 so I have to figure out why that's happening.

thegreekgeek commented 4 days ago

Aaaaand deleted the files and reran the ocr, seems to work! Thanks for the help!

aborel commented 4 days ago

Thanks letting us know!