Closed AndrewRRM closed 2 years ago
Can you save the extracted text in a text note?
It sounds like the steps are running through and then only the resulting PDF is invalid. Does this happen with every PDF you try? If yes, can you share the resulting PDF and the one you started with here as an example?
Sorry for the slow reply. Actually, I have no error anymore. In fact, nothing happens when I try to ocr now. Doesn't matter what pdf I attempt.
Hi,
I have the same or a similar problem with Zotero on Windows 10.
When I run OCR a command prompt for pdftoppm pops up and just sits there without doing anything, even left it for about 30 minutes but nothing (I should say, it does something since the cpu seems to react, but I don't know what). When I close down pdftoppm a command prompt for tesseract pops up and seems to do its thing however, the pdf that it saves is corrupt. Tried a few pdf-readers just to be sure.
Also worth noting, if I turn on "Save output as a note" it does seem to work but not every time.
Running Zotero 5.0.96.2 with the latest Zotero OCR, tesseract and poppler for windows 21.03.0.
Let me know if you need anything else.
The "path" settings require the full path of the executables, not only the directory part. These settings work for me on MacOS with M1:
I'd use different paths because the above settings would have to be updated each time after an upgrade of tesseract
or poppler
. In addition I'd also replace the default script by script/Latin
which covers all Western European scripts:
Those are basically the settings I use, except for the script/Latin. (And that I'm on Win 10)
The thing is, everything initiates but it won't work. The resulting pdf always comes out corrupt and unreadable.
I am closing this, since the initial question by @AndrewRRM seems to be answered by @stweil.
@otheivan If you have still any issues, then please open another issue about it. I think it is easier to seperate the different operating systems on these issues.
Not sure if this is the right place for a help request so feel free to move or delete this post. I've already posted over at the Zotero forum.
I cannot get this to work.
I've installed tesseract and poppler with Homebrew, installed the zotero plugin and set the path in the Zotero plugin to:
(/opt/homebrew/Cellar/tesseract/4.1.1 /opt/homebrew/Cellar/poppler/21.03.0_1/bin
I can confirm that this where those files do live.
I've also copied the pdftoppm into /Applications/Zotero.app/Contents/MacOS/pdftoppm according a recommendation on the Zotero forum, although I have tried to run the ocr with and without this step.
When I run the plugin, an ocr file appears but when I try to open it I get the following error:
Format Error: Not a PDF or corrupted.
PDF.js v2.8.146 (build: 7dd64325d) Message: Invalid PDF structure.
Help? ...