LeoFCardoso / pdf2pdfocr

A free tool to OCR a PDF and add a text "layer" in the original file, making a searchable PDF. Use only open source tools. Please tip!
Apache License 2.0
266 stars 33 forks source link

Output file could not be created #21

Closed kenyonit closed 3 years ago

kenyonit commented 3 years ago

Any ideas why this would be failing? Unable to generate a final PDF

pdf2pdfocr.py -i test.pdf -o test2.pdf -v -k -r 200

[2020-11-03 11:16:18.697012] [LOG] Tesseract can 'textonly_pdf': True [2020-11-03 11:16:18.702393] [LOG] Tesseract version: 4 [2020-11-03 11:16:18.702628] [DEBUG] cuneiform not available [2020-11-03 11:16:18.716257] [DEBUG] Temp dir is /tmp/ [2020-11-03 11:16:18.716342] [DEBUG] Prefix is C6UIH [2020-11-03 11:16:18.716374] [DEBUG] Script dir is /usr/local/bin/ [2020-11-03 11:16:18.716462] [DEBUG] Parallel operations will use 1 CPUs [2020-11-03 11:16:18.716560] [LOG] Welcome to pdf2pdfocr version 1.6.1 marapurense - https://github.com/LeoFCardoso/pdf2pdfocr [2020-11-03 11:16:18.719250] [LOG] Input file /home/john/test.pdf: type is application/pdf [2020-11-03 11:16:18.720583] [DEBUG] Output file: test2.pdf for PDF and test2.pdf.txt for TXT [2020-11-03 11:16:18.720644] [LOG] Converting input file to images... [2020-11-03 11:16:19.544142] [LOG] Starting OCR with tesseract... [2020-11-03 11:16:19.550422] [LOG] Waiting for OCR to complete. 0/1 pages completed... [2020-11-03 11:16:24.553051] [LOG] OCR completed [2020-11-03 11:16:24.553681] [DEBUG] We have 1 ocr'ed files [2020-11-03 11:16:24.557630] [DEBUG] Joined ocr'ed PDF files [2020-11-03 11:16:24.557677] [DEBUG] Merging with OCR [2020-11-03 11:16:24.564783] [DEBUG] Fail to merge source PDF with extracted OCR text. Trying to fix source PDF to build final file... [2020-11-03 11:16:25.222864] [DEBUG] Merging with OCR Output file could not be created :( Exiting with error code.

LeoFCardoso commented 3 years ago

Hi there, thanks for you message. Can you try with complete path in "-o" flag? If it keep failing, can you share input pdf ?

kenyonit commented 3 years ago

Hi there, tested with complete path and still the same outcome. Here is the sample file: test.pdf

LeoFCardoso commented 3 years ago

Works for me... :( Can you please check your qpdf version (qpdf --version) ?

kenyonit commented 3 years ago

qpdf version 8.2.1

LeoFCardoso commented 3 years ago

We need qpdf minimum 8.4.1 ... I'l fix the code to handle this situation. Can you try the upgrade? Another workaround you can try is to edit line 420, to cmd_qpdf = ""

kenyonit commented 3 years ago

Tested, works with qpdf-10.0.3 Also tested workaround - works too! Cheers!

LeoFCardoso commented 3 years ago

Great! I'm fixing the code now...

kenyonit commented 3 years ago

Thanks for the fast turn around, very much appreciated!