UB-Mannheim / zotero-ocr

Zotero Plugin for OCR
GNU Affero General Public License v3.0
551 stars 40 forks source link

Idea: pdfimages from poppler (xpdf) to convert pdf to images #1

Closed zuphilip closed 5 years ago

zuphilip commented 5 years ago

Zotero already uses pdfinfo and pdftotext from the poppler (xpdf) library. A modfied version of these two command line tools are created in https://github.com/zotero/cross-poppler and shipped with Zotero. After the Zotero installation these two tools are in the Zotero program folder, e.g. C:\Program Files (x86)\Zotero\.

If we have also pdfimages in this folder, then it can be used similarly from within Zotero. I would suggest for a first test to download it manually and place it in that folder. If we see that this than all works together, then we can think about, whether we can install it during the Zotero plugin installation.

Example call:

pdfimages.exe -png ../../Funke_1996_Meth.pdf out/funke
zuphilip commented 5 years ago

Implemented in 4f8b54a4d640711ed9211da47898c5709a7ca559 but with pdftoppm instead.