UB-Mannheim / zotero-ocr

Zotero Plugin for OCR
GNU Affero General Public License v3.0
552 stars 40 forks source link

Can't config the path to OCR engine on linux #39

Closed morpheus-sapiens-amans closed 10 months ago

morpheus-sapiens-amans commented 2 years ago

tesseract-ocr is he engine used by Zotero OCR to recognize and extract content, but the installation guide only shows the path for windows machine.

  1. I tried whereis tessarect-ocr to locate the path for the engine and I got /usr/share/tesseract-ocr as a result, but when I applied to the preferences in Zotero, it says no executable found.

Does anyone knows what to do to config?

thanks

morpheus-sapiens-amans commented 2 years ago

I've just found this other post that shows the same problem as I do.

zotero forums

Hi, I wanted to check with you guys what are the parameters for the OCR plugin on linux. I went to the github page and found the link to configuration for mac and linux does not work correctly. So here are my config param:

First, I installed tesseract-ocr provided by my repositories.

For the OCR engine: /usr/bin/tesseract For pdftoppm: /usr/bon/pdftoppm For the language script: script/Latin

zuphilip commented 2 years ago

Is /usr/share/tesseract-ocr not only the directory? Then you should try something like /usr/share/tesseract-ocr/tesseract instead. But you can also try to leave it simple empty, then the default options will be tried out: https://github.com/UB-Mannheim/zotero-ocr/blob/9a1e87c8e5a588c7f9c046f812ee7de55f277ec1/chrome/content/zoteroocr.js#L36

If this does not help, then activate the debug log and look what exactly is tried to call tesseract.

stweil commented 2 years ago

/usr/bin/tesseract is the correct setting for the typical installation on Linux.

Seraphli commented 1 year ago

Hi, I'm having a similar problem. In my case, I have set the right path. But it tells no executable found. image

~ ❯ which tesseract   
/usr/bin/tesseract
~ ❯ which pdftoppm                  
/usr/bin/pdftoppm

I don't know why.

ChildishGiant commented 1 year ago

For me it was because I was using the flatpak version which messes everything up. Reinstalling from the tarball as described here made everything work.

ohickl commented 12 months ago

Hi, I'm having a similar problem. In my case, I have set the right path. But it tells no executable found. image

~ ❯ which tesseract   
/usr/bin/tesseract
~ ❯ which pdftoppm                  
/usr/bin/pdftoppm

I don't know why.

Same problem here on Nobara Linux 38 Wayland (GNOME 44.2).

zuphilip commented 11 months ago

Try to activate the debug output in Zotero and then select a test PDF and click on the "OCR selected PDF(s)". Then in the debug output you should see exactly the path used to call the different tools.

Seraphli commented 10 months ago

When I tested it today, I found that the issue seemed to be resolved. It's been a long time since then, so I don't know what caused the problem at that time.