Closed kobeegh closed 1 year ago
Can you run a file with debug mode (loglevel 2), please?
Sure:
-----------------------------------
| ==> installation info <== |
-----------------------------------
synOCR-user: synOCR
synOCR-user is admin: yes
synOCR-version: 1.4.0
Architecture: x86_64
DSM-build: 42962
Device: 918plus (2053505210)
current Profil: default
monitor is running?: no
DB-version: 9
used image (created): jbarlow83/ocrmypdf:latest (2023-06-20T10:34:03)
document author:
used ocr-parameter (raw): -srd -l deu
OCR-arg 1: -srd
OCR-arg 2: -l
OCR-arg 3: deu
ocropt_array: -srd -l deu
search prefix:
replace search prefix: no
renaming syntax:
Symbol for tag marking: #
target file handling: no
Document split pattern:
split page handling: discard
delete blank pages:
threshold black/white:
threshold black pixels:
clean up spaces: false
Date search method: use standard search via RegEx
date found order: firstfound
source for filedate: now
ignored dates by search:
date range in past: 0 [absolute: 0]
date range in future: 0 [absolute: 0]
PATH-Variable: /sbin:/bin:/usr/sbin:/usr/bin:/usr/syno/sbin:/usr/syno/bin:/usr/local/sbin:/usr/local/bin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/syno/bin:/usr/syno/sbin:/usr/local/bin:/opt/usr/bin:/usr/syno/synoman/webman/3rdparty/synOCR/bin:/usr/local/bin:/opt/usr/bin
Docker test: OK
DSM notify to user: @administrators
apprise notify service:
apprise attachment: false
notify language: enu
Loglevel: debug
max. count of logfiles: 10
rotate backupfiles after: (purge backup deactivated)
Source directory: /volume1/OCR/_INPUT/
Target directory: /volume1/OCR/_OUTPUT/
BackUp directory: /volume1/OCR/_BACKUP/
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
● ---------------------------------- ●
● | ==> RUN THE FUNCTIONS <== | ●
● ---------------------------------- ●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
-----------------------------------------------------------------------------------
| check the python3 installation and the necessary modules: |
-----------------------------------------------------------------------------------
[runtime up to now: 00:00:01]
Check Python:
module list:
Package Version
--------------------- -----------
apprise 1.4.0
argcomplete 3.0.8
backports.zoneinfo 0.2.1
certifi 2023.5.7
charset-normalizer 3.1.0
click 8.1.3
dateparser 1.1.8
DateTime 5.1
deprecation 2.1.0
idna 3.4
importlib-metadata 6.7.0
lxml 4.9.2
Markdown 3.4.3
oauthlib 3.2.2
packaging 23.1
pikepdf 7.1.2
Pillow 9.5.0
pip 23.1.2
PyPDF2 2.3.1
python-dateutil 2.8.2
pytz 2023.3
pytz-deprecation-shim 0.1.0.post0
PyYAML 6.0
regex 2023.5.5
requests 2.31.0
requests-oauthlib 1.3.1
setuptools 56.0.0
six 1.16.0
tomlkit 0.11.8
typing_extensions 4.5.0
tzdata 2023.3
tzlocal 4.3
urllib3 2.0.3
xmltodict 0.13.0
yq 3.2.2
zipp 3.15.0
zope.interface 6.0
prepare_python: OK
Target temp directory: /tmp/tmp.rivsr98dQA
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
● STEP 1 - RUN OCR / SPLIT FILES, IF NEEDED: ●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
CURRENT FILE: ➜ 2023.07.01 - testfile.pdf
temp. target file: /tmp/tmp.rivsr98dQA/step1_tmp_1688575421/2023.07.01 - testfile.pdf
-----------------------------------------------------------------------------------
| processing PDF @ OCRmyPDF: |
-----------------------------------------------------------------------------------
[runtime up to now: 00:00:00]
➜ OCRmyPDF-LOG:
WARNING: Error loading config file: .dockercfg: $HOME is not defined
DEBUG ocrmypdf - ocrmypdf 14.2.2.dev31+g7c38c717.d20230620
DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--version']
DEBUG ocrmypdf.subprocess - Found tesseract 5.3.1-22-g24da
DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--version']
DEBUG ocrmypdf.subprocess - Running: ['gs', '--version']
DEBUG ocrmypdf.subprocess - Found gs 9.55.0
DEBUG ocrmypdf.subprocess - Running: ['gs', '--version']
DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--list-langs']
DEBUG ocrmypdf.subprocess.tesseract - stdout/stderr = List of available languages in "/usr/share/tesseract-ocr/5/tessdata/" (7):
chi_sim
deu
eng
fra
osd
por
spa
INFO ocrmypdf._validation - reading file from standard input
DEBUG ocrmypdf.helpers - os.symlink(/tmp/ocrmypdf.io.uuaw1x_6/stdin, /tmp/ocrmypdf.io.uuaw1x_6/origin.pdf)
DEBUG ocrmypdf.builtin_plugins.tesseract_ocr - Using Tesseract OpenMP thread limit 3
INFO ocrmypdf._pipeline - 1 skipping all processing on this page
DEBUG ocrmypdf._graft - 1 Text rotation: (text, autorotate, content) -> text misalignment = (0, 0, 0) -> 0
DEBUG ocrmypdf._graft - 1 Page rotation: (content, auto) -> page = (0, 0) -> 0
INFO ocrmypdf._sync - Postprocessing...
DEBUG ocrmypdf.helpers - os.symlink(/tmp/ocrmypdf.io.uuaw1x_6/graft_layers.pdf, /tmp/ocrmypdf.io.uuaw1x_6/fix_docinfo.pdf)
DEBUG ocrmypdf.subprocess - Running: ['gs', '--version']
DEBUG ocrmypdf.subprocess - Running: ['gs', '-dBATCH', '-dNOPAUSE', '-dSAFER', '-dCompatibilityLevel=1.6', '-sDEVICE=pdfwrite', '-dAutoRotatePages=/None', '-sColorConversionStrategy=LeaveColorUnchanged', '-dPDFSTOPONERROR', '-dAutoFilterColorImages=true', '-dAutoFilterGrayImages=true', '-dJPEGQ=95', '-dPDFA=2', '-dPDFACompatibilityPolicy=1', '-o', '-', '-sstdout=%stderr', '/tmp/ocrmypdf.io.uuaw1x_6/fix_docinfo.pdf', '/tmp/ocrmypdf.io.uuaw1x_6/pdfa.ps']
DEBUG ocrmypdf.subprocess.gs - GPL Ghostscript 9.55.0 (2021-09-27)
DEBUG ocrmypdf.subprocess.gs - Copyright (C) 2021 Artifex Software, Inc. All rights reserved.
DEBUG ocrmypdf.subprocess.gs - This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
DEBUG ocrmypdf.subprocess.gs - see the file COPYING for details.
DEBUG ocrmypdf.subprocess.gs - Processing pages 1 through 1.
DEBUG ocrmypdf.subprocess.gs - Page 1
DEBUG ocrmypdf.subprocess.gs - GPL Ghostscript 9.55.0: Setting Overprint Mode to 1
DEBUG ocrmypdf.subprocess.gs - not permitted in PDF/A-2, overprint mode not set
DEBUG ocrmypdf.subprocess.gs -
DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--version']
DEBUG ocrmypdf.optimize - xref 219: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 225: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 221: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 220: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 222: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 218: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 223: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 224: treating as an optimization candidate
DEBUG ocrmypdf.optimize - XrefExt(xref=224, ext='.png')
DEBUG ocrmypdf.optimize - XrefExt(xref=225, ext='.png')
DEBUG ocrmypdf.optimize - XrefExt(xref=218, ext='.png')
DEBUG ocrmypdf.optimize - XrefExt(xref=219, ext='.png')
DEBUG ocrmypdf.optimize - XrefExt(xref=220, ext='.png')
DEBUG ocrmypdf.optimize - XrefExt(xref=221, ext='.png')
DEBUG ocrmypdf.optimize - XrefExt(xref=222, ext='.png')
DEBUG ocrmypdf.optimize - XrefExt(xref=223, ext='.png')
DEBUG ocrmypdf.optimize - Optimizable images: JPEGs: 0 PNGs: 8
DEBUG ocrmypdf.optimize - xref 219: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 225: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 221: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 220: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 222: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 218: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 223: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 224: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 224: marking this JPEG as deflatable
DEBUG ocrmypdf.optimize - xref 225: marking this JPEG as deflatable
DEBUG ocrmypdf.optimize - xref 218: marking this JPEG as deflatable
DEBUG ocrmypdf.optimize - xref 219: marking this JPEG as deflatable
DEBUG ocrmypdf.optimize - xref 221: marking this JPEG as deflatable
DEBUG ocrmypdf.optimize - xref 223: marking this JPEG as deflatable
DEBUG ocrmypdf.optimize - xref 219: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 225: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 221: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 220: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 222: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 218: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 223: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 224: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 224: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization
DEBUG ocrmypdf.optimize - xref 225: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization
DEBUG ocrmypdf.optimize - xref 218: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization
DEBUG ocrmypdf.optimize - xref 219: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization
DEBUG ocrmypdf.optimize - xref 221: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization
DEBUG ocrmypdf.optimize - xref 223: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization
DEBUG ocrmypdf.optimize - Optimizable images: JBIG2 groups: 0
DEBUG ocrmypdf.helpers - os.symlink(/tmp/ocrmypdf.io.uuaw1x_6/optimize.opt.pdf, /tmp/ocrmypdf.io.uuaw1x_6/optimize.pdf)
DEBUG ocrmypdf.subprocess - Running: ['jbig2', '--version']
DEBUG ocrmypdf.subprocess - Running: ['pngquant', '--version']
INFO ocrmypdf._pipeline - Image optimization ratio: 1.00 savings: 0.4%
INFO ocrmypdf._pipeline - Total file size ratio: 1.01 savings: 0.6%
DEBUG ocrmypdf._pipeline - /tmp/ocrmypdf.io.uuaw1x_6/optimize.pdf -> -
INFO ocrmypdf._sync - Output sent to stdout
← OCRmyPDF-LOG-END
[runtime up to now: 00:00:18]
target file (OK): /tmp/tmp.rivsr98dQA/step1_tmp_1688575421/2023.07.01 - testfile.pdf
no split pattern defined or splitting not possible
-----------------------------------------------------------------------------------
| handle source file: |
-----------------------------------------------------------------------------------
➜ backup source file to: /volume1/OCR/_BACKUP/2023.07.01 - testfile.pdf
removed directory '/tmp/tmp.rivsr98dQA/step1_tmp_1688575421/'
Stats:
runtime last file: ➜ 00:00:18
runtime 1st step (all files): ➜ 00:00:25
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
● STEP 2 - SEARCH TAGS / RENAME / SORT: ●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
list files in INPUT with transcoded special characters:
➜ 2023.07.01 - testfile.pdf$
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'pypdf'
ERROR at line 2284: pagecount_latest=$( py_page_count "${input}" )
(pages counted with python module pypdf)
./synOCR.sh: line 2299: 1182+ERROR at line 1739: python3
ERROR at line 2284: python3: syntax error in expression (error token is "at line 1739: python3
ERROR at line 2284: python3")
purge log files ...
delete 1 log files ( > 10 files)
delete -10 search files ( > 10 files)
purge backup deactivated!
rmdir: failed to remove '/tmp/tmp.rivsr98dQA': Directory not empty
rmdir: removing directory, '/tmp/tmp.rivsr98dQA'
runtime all files: ➜ 00:00:25
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
● ---------------------------------- ●
● | ==> END OF FUNCTIONS <== | ●
● ---------------------------------- ●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
For some reason, the Python environment for synOCR has not been updated.
Do you see yourself being able to run the command below in the terminal or via the DSM task scheduler? It will delete the Python environment so that it will be recreated on the next run.
rm -rf /usr/syno/synoman/webman/3rdparty/synOCR/python3_env
Alternatively, you can use HyperBackup to create a backup of synOCR and then uninstall synOCR; reinstall it and restore the backup.
Have deleted the directory and started a new attempt. It works now, thank you very much!
If it is of interest to you, here is the debug log from the successful run where it initially pulls up the python environment and modules:
-----------------------------------
| ==> installation info <== |
-----------------------------------
synOCR-user: synOCR
synOCR-user is admin: yes
synOCR-version: 1.4.0
Architecture: x86_64
DSM-build: 42962
Device: 918plus (2053505210)
current Profil: default
monitor is running?: no
DB-version: 9
used image (created): jbarlow83/ocrmypdf:latest (2023-06-20T10:34:03)
document author:
used ocr-parameter (raw): -srd -l deu
OCR-arg 1: -srd
OCR-arg 2: -l
OCR-arg 3: deu
ocropt_array: -srd -l deu
search prefix:
replace search prefix: no
renaming syntax:
Symbol for tag marking: #
target file handling: no
Document split pattern:
split page handling: discard
delete blank pages:
threshold black/white:
threshold black pixels:
clean up spaces: false
Date search method: use standard search via RegEx
date found order: firstfound
source for filedate: now
ignored dates by search:
date range in past: 0 [absolute: 0]
date range in future: 0 [absolute: 0]
PATH-Variable: /sbin:/bin:/usr/sbin:/usr/bin:/usr/syno/sbin:/usr/syno/bin:/usr/local/sbin:/usr/local/bin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/syno/bin:/usr/syno/sbin:/usr/local/bin:/opt/usr/bin:/usr/syno/synoman/webman/3rdparty/synOCR/bin:/usr/local/bin:/opt/usr/bin
Docker test: OK
DSM notify to user: @administrators
apprise notify service:
apprise attachment: false
notify language: enu
Loglevel: debug
max. count of logfiles: 10
rotate backupfiles after: (purge backup deactivated)
Source directory: /volume1/OCR/_INPUT/
Target directory: /volume1/OCR/_OUTPUT/
BackUp directory: /volume1/OCR/_BACKUP/
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
● ---------------------------------- ●
● | ==> RUN THE FUNCTIONS <== | ●
● ---------------------------------- ●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
-----------------------------------------------------------------------------------
| check the python3 installation and the necessary modules: |
-----------------------------------------------------------------------------------
[runtime up to now: 00:00:00]
Check Python:
python3 already installed (/usr/syno/synoman/webman/3rdparty/synOCR/python3_env/bin/python3)
Check pip:
pip already installed (pip 21.1.1 from /usr/syno/synoman/webman/3rdparty/synOCR/python3_env/lib/python3.8/site-packages/pip (python 3.8)) / upgrade available ...
Requirement already satisfied: pip in ./python3_env/lib/python3.8/site-packages (21.1.1)
Collecting pip
Using cached pip-23.1.2-py3-none-any.whl (2.1 MB)
Installing collected packages: pip
Attempting uninstall: pip
Found existing installation: pip 21.1.1
Uninstalling pip-21.1.1:
Successfully uninstalled pip-21.1.1
Successfully installed pip-23.1.2
read installed python modules:
Package Version
---------- -------
pip 23.1.2
setuptools 56.0.0
➜ check python module "DateTime": ➜ DateTime was not found and will be installed ➜ ok
➜ check python module "dateparser": ➜ dateparser was not found and will be installed ➜ ok
➜ check python module "pypdf==3.5.1": ➜ pypdf==3.5.1 was not found and will be installed ➜ ok
➜ check python module "pikepdf==7.1.2": ➜ pikepdf==7.1.2 was not found and will be installed ➜ ok
➜ check python module "Pillow": ➜ Pillow was not found and will be installed ➜ ok
➜ check python module "yq": ➜ yq was not found and will be installed ➜ ok
➜ check python module "PyYAML": ➜ PyYAML was not found and will be installed ➜ ok
➜ check python module "apprise": ➜ apprise was not found and will be installed ➜ ok
module list:
Package Version
------------------ --------
apprise 1.4.0
argcomplete 3.1.1
backports.zoneinfo 0.2.1
certifi 2023.5.7
charset-normalizer 3.1.0
click 8.1.3
dateparser 1.1.8
DateTime 5.1
deprecation 2.1.0
idna 3.4
importlib-metadata 6.7.0
lxml 4.9.3
Markdown 3.4.3
oauthlib 3.2.2
packaging 23.1
pikepdf 7.1.2
Pillow 10.0.0
pip 23.1.2
pypdf 3.5.1
python-dateutil 2.8.2
pytz 2023.3
PyYAML 6.0
regex 2023.6.3
requests 2.31.0
requests-oauthlib 1.3.1
setuptools 56.0.0
six 1.16.0
tomlkit 0.11.8
typing_extensions 4.7.1
tzlocal 5.0.1
urllib3 2.0.3
xmltodict 0.13.0
yq 3.2.2
zipp 3.15.0
zope.interface 6.0
prepare_python: OK
Target temp directory: /tmp/tmp.oMs0RqPMX7
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
● STEP 1 - RUN OCR / SPLIT FILES, IF NEEDED: ●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
CURRENT FILE: ➜ 2023.07.01 - testfile.pdf
temp. target file: /tmp/tmp.oMs0RqPMX7/step1_tmp_1688576586/2023.07.01 - testfile.pdf
-----------------------------------------------------------------------------------
| processing PDF @ OCRmyPDF: |
-----------------------------------------------------------------------------------
[runtime up to now: 00:00:00]
➜ OCRmyPDF-LOG:
WARNING: Error loading config file: .dockercfg: $HOME is not defined
DEBUG ocrmypdf - ocrmypdf 14.2.2.dev31+g7c38c717.d20230620
DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--version']
DEBUG ocrmypdf.subprocess - Found tesseract 5.3.1-22-g24da
DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--version']
DEBUG ocrmypdf.subprocess - Running: ['gs', '--version']
DEBUG ocrmypdf.subprocess - Found gs 9.55.0
DEBUG ocrmypdf.subprocess - Running: ['gs', '--version']
DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--list-langs']
DEBUG ocrmypdf.subprocess.tesseract - stdout/stderr = List of available languages in "/usr/share/tesseract-ocr/5/tessdata/" (7):
chi_sim
deu
eng
fra
osd
por
spa
INFO ocrmypdf._validation - reading file from standard input
DEBUG ocrmypdf.helpers - os.symlink(/tmp/ocrmypdf.io.0889pg_j/stdin, /tmp/ocrmypdf.io.0889pg_j/origin.pdf)
DEBUG ocrmypdf.builtin_plugins.tesseract_ocr - Using Tesseract OpenMP thread limit 3
INFO ocrmypdf._pipeline - 1 skipping all processing on this page
DEBUG ocrmypdf._graft - 1 Text rotation: (text, autorotate, content) -> text misalignment = (0, 0, 0) -> 0
DEBUG ocrmypdf._graft - 1 Page rotation: (content, auto) -> page = (0, 0) -> 0
INFO ocrmypdf._sync - Postprocessing...
DEBUG ocrmypdf.helpers - os.symlink(/tmp/ocrmypdf.io.0889pg_j/graft_layers.pdf, /tmp/ocrmypdf.io.0889pg_j/fix_docinfo.pdf)
DEBUG ocrmypdf.subprocess - Running: ['gs', '--version']
DEBUG ocrmypdf.subprocess - Running: ['gs', '-dBATCH', '-dNOPAUSE', '-dSAFER', '-dCompatibilityLevel=1.6', '-sDEVICE=pdfwrite', '-dAutoRotatePages=/None', '-sColorConversionStrategy=LeaveColorUnchanged', '-dPDFSTOPONERROR', '-dAutoFilterColorImages=true', '-dAutoFilterGrayImages=true', '-dJPEGQ=95', '-dPDFA=2', '-dPDFACompatibilityPolicy=1', '-o', '-', '-sstdout=%stderr', '/tmp/ocrmypdf.io.0889pg_j/fix_docinfo.pdf', '/tmp/ocrmypdf.io.0889pg_j/pdfa.ps']
DEBUG ocrmypdf.subprocess.gs - GPL Ghostscript 9.55.0 (2021-09-27)
DEBUG ocrmypdf.subprocess.gs - Copyright (C) 2021 Artifex Software, Inc. All rights reserved.
DEBUG ocrmypdf.subprocess.gs - This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
DEBUG ocrmypdf.subprocess.gs - see the file COPYING for details.
DEBUG ocrmypdf.subprocess.gs - Processing pages 1 through 1.
DEBUG ocrmypdf.subprocess.gs - Page 1
DEBUG ocrmypdf.subprocess.gs - GPL Ghostscript 9.55.0: Setting Overprint Mode to 1
DEBUG ocrmypdf.subprocess.gs - not permitted in PDF/A-2, overprint mode not set
DEBUG ocrmypdf.subprocess.gs -
DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--version']
DEBUG ocrmypdf.optimize - xref 223: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 218: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 222: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 225: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 220: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 219: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 224: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 221: treating as an optimization candidate
DEBUG ocrmypdf.optimize - XrefExt(xref=224, ext='.png')
DEBUG ocrmypdf.optimize - XrefExt(xref=225, ext='.png')
DEBUG ocrmypdf.optimize - XrefExt(xref=218, ext='.png')
DEBUG ocrmypdf.optimize - XrefExt(xref=219, ext='.png')
DEBUG ocrmypdf.optimize - XrefExt(xref=220, ext='.png')
DEBUG ocrmypdf.optimize - XrefExt(xref=221, ext='.png')
DEBUG ocrmypdf.optimize - XrefExt(xref=222, ext='.png')
DEBUG ocrmypdf.optimize - XrefExt(xref=223, ext='.png')
DEBUG ocrmypdf.optimize - Optimizable images: JPEGs: 0 PNGs: 8
DEBUG ocrmypdf.optimize - xref 223: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 218: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 222: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 225: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 220: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 219: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 224: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 221: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 224: marking this JPEG as deflatable
DEBUG ocrmypdf.optimize - xref 225: marking this JPEG as deflatable
DEBUG ocrmypdf.optimize - xref 218: marking this JPEG as deflatable
DEBUG ocrmypdf.optimize - xref 219: marking this JPEG as deflatable
DEBUG ocrmypdf.optimize - xref 221: marking this JPEG as deflatable
DEBUG ocrmypdf.optimize - xref 223: marking this JPEG as deflatable
DEBUG ocrmypdf.optimize - xref 223: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 218: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 222: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 225: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 220: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 219: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 224: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 221: treating as an optimization candidate
DEBUG ocrmypdf.optimize - xref 224: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization
DEBUG ocrmypdf.optimize - xref 225: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization
DEBUG ocrmypdf.optimize - xref 218: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization
DEBUG ocrmypdf.optimize - xref 219: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization
DEBUG ocrmypdf.optimize - xref 221: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization
DEBUG ocrmypdf.optimize - xref 223: found image compressed as /FlateDecode /DCTDecode, marked for JPEG optimization
DEBUG ocrmypdf.optimize - Optimizable images: JBIG2 groups: 0
DEBUG ocrmypdf.helpers - os.symlink(/tmp/ocrmypdf.io.0889pg_j/optimize.opt.pdf, /tmp/ocrmypdf.io.0889pg_j/optimize.pdf)
DEBUG ocrmypdf.subprocess - Running: ['jbig2', '--version']
DEBUG ocrmypdf.subprocess - Running: ['pngquant', '--version']
INFO ocrmypdf._pipeline - Image optimization ratio: 1.00 savings: 0.4%
INFO ocrmypdf._pipeline - Total file size ratio: 1.01 savings: 0.6%
DEBUG ocrmypdf._pipeline - /tmp/ocrmypdf.io.0889pg_j/optimize.pdf -> -
INFO ocrmypdf._sync - Output sent to stdout
← OCRmyPDF-LOG-END
[runtime up to now: 00:00:16]
target file (OK): /tmp/tmp.oMs0RqPMX7/step1_tmp_1688576586/2023.07.01 - testfile.pdf
no split pattern defined or splitting not possible
-----------------------------------------------------------------------------------
| handle source file: |
-----------------------------------------------------------------------------------
➜ backup source file to: /volume1/OCR/_BACKUP/2023.07.01 - testfile.pdf
removed directory '/tmp/tmp.oMs0RqPMX7/step1_tmp_1688576586/'
Stats:
runtime last file: ➜ 00:00:16
runtime 1st step (all files): ➜ 00:01:49
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
● STEP 2 - SEARCH TAGS / RENAME / SORT: ●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
list files in INPUT with transcoded special characters:
➜ 2023.07.01 - testfile.pdf$
(pages counted with python module pypdf)
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
CURRENT FILE: ➜ 2023.07.01 - testfile.pdf
➜ File permissions source file:
-rw-rw-r-- 1 synOCR synOCR 219590 Jul 5 18:02 /tmp/tmp.oMs0RqPMX7/2023.07.01 - testfile.pdf
-----------------------------------------------------------------------------------
| search tags in ocr text: |
-----------------------------------------------------------------------------------
no tags defined
-----------------------------------------------------------------------------------
| search for a valid date in ocr text: |
-----------------------------------------------------------------------------------
run RegEx date search - search for date format: 1 (1 = dd mm [yy]yy; 2 = [yy]yy mm dd; 3 = mm dd [yy]yy)
run RegEx date search - search for date format: 2 (1 = dd mm [yy]yy; 2 = [yy]yy mm dd; 3 = mm dd [yy]yy)
run RegEx date search - search for date format: 3 (1 = dd mm [yy]yy; 2 = [yy]yy mm dd; 3 = mm dd [yy]yy)
Date not found in OCR text - use file date:
day: 05
month:07
year: 2023
-----------------------------------------------------------------------------------
| rename and sort to target folder: |
-----------------------------------------------------------------------------------
[runtime up to now: 00:00:01]
➜ renaming:
apply renaming syntax ➜ ! WARNING ! – No variables were found for renaming. A fallback is used to prevent an empty file name: 2023.07.01 - testfile
[runtime up to now: 00:00:01]
➜ insert metadata (use python pikepdf)
used metadata:
➜ '/Author': '',
➜ '/Keywords': '',
➜ '/CreationDate': 'D:20230705',
➜ '/CreatorTool': 'synOCR 1.4.0'
call handlePdf.py -dbg_lvl "2" -dbg_file "/volume1/OCR/_LOG/synOCR_2023-07-05_19-01-33.log" -task metadata -inputFile "/tmp/tmp.oMs0RqPMX7/step2_tmp_1688576602/temp_2023.07.01 - testfile_1688576602.pdf" -metaData "{'/Author': '',
'/Keywords': '',
'/CreationDate': 'D:20230705',
'/CreatorTool': 'synOCR 1.4.0'}" -outputFile "/tmp/tmp.oMs0RqPMX7/step2_tmp_1688576602/temp_2023.07.01 - testfile_1688576602.pdf_meta.pdf"
2023-07-05 19:03:23,958 - INFO - HandlePdf started
2023-07-05 19:03:23,958 - INFO - Version: 0.2
2023-07-05 19:03:23,958 - INFO - Task=metadata
2023-07-05 19:03:23,959 - DEBUG - set_task_metadata_parameter(input_file=/tmp/tmp.oMs0RqPMX7/step2_tmp_1688576602/temp_2023.07.01 - testfile_1688576602.pdf, output_file=/tmp/tmp.oMs0RqPMX7/step2_tmp_1688576602/temp_2023.07.01 - testfile_1688576602.pdf_meta.pdf, meta_data_str={'/Author': '',
'/Keywords': '',
'/CreationDate': 'D:20230705',
'/CreatorTool': 'synOCR 1.4.0'})
2023-07-05 19:03:23,959 - DEBUG - <<<<<< set_task_meta_data_parameter ended
2023-07-05 19:03:23,959 - DEBUG - >>>>>> open_pdf started
2023-07-05 19:03:23,965 - DEBUG - <<<<<< open_pdf ended
2023-07-05 19:03:23,966 - INFO - >>>>> write meta_data started
2023-07-05 19:03:23,966 - DEBUG - old meta_data....
2023-07-05 19:03:23,966 - DEBUG - >>>>> log metadata >>>>>)
2023-07-05 19:03:23,967 - DEBUG - <x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="XMP toolkit 2.9.1-13, framework 1.6">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:iX="http://ns.adobe.com/iX/1.0/">
<rdf:Description xmlns:pdf="http://ns.adobe.com/pdf/1.3/" rdf:about="" pdf:Producer="pikepdf 7.2.0"/>
<rdf:Description xmlns:xmp="http://ns.adobe.com/xap/1.0/" rdf:about=""><xmp:ModifyDate>2023-07-05T17:03:17+00:00</xmp:ModifyDate>
<xmp:CreateDate>2023-06-27T09:46:39+02:00</xmp:CreateDate>
<xmp:CreatorTool>ocrmypdf 14.2.2.dev31+g7c38c717.d20230620 / Tesseract OCR-PDF 5.3.1-22-g24da</xmp:CreatorTool></rdf:Description>
<rdf:Description xmlns:xapMM="http://ns.adobe.com/xap/1.0/mm/" rdf:about="" xapMM:DocumentID="uuid:62258a05-5372-11f9-0000-5664f20a76f7"/>
<rdf:Description xmlns:dc="http://purl.org/dc/elements/1.1/" rdf:about="" dc:format="application/pdf"><dc:title><rdf:Alt><rdf:li xml:lang="x-default">Deg. 27. KW</rdf:li></rdf:Alt></dc:title><dc:creator><rdf:Seq><rdf:li>KOBILIN</rdf:li></rdf:Seq></dc:creator></rdf:Description>
<rdf:Description xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/" rdf:about="" pdfaid:part="2" pdfaid:conformance="B"/><rdf:Description xmlns:xmp="http://ns.adobe.com/xap/1.0/" rdf:about="" xmp:MetadataDate="2023-07-05T17:03:17.438999+00:00"/></rdf:RDF>
</x:xmpmeta>
2023-07-05 19:03:23,967 - DEBUG - <<<<< log metadata <<<<<)
2023-07-05 19:03:24,020 - DEBUG - new meta_data....
2023-07-05 19:03:24,020 - DEBUG - <x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="XMP toolkit 2.9.1-13, framework 1.6">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:iX="http://ns.adobe.com/iX/1.0/">
<rdf:Description xmlns:pdf="http://ns.adobe.com/pdf/1.3/" rdf:about="" pdf:Producer="pikepdf 7.1.2"/>
<rdf:Description xmlns:xmp="http://ns.adobe.com/xap/1.0/" rdf:about=""><xmp:ModifyDate>2023-07-05T00:00:00</xmp:ModifyDate>
<xmp:CreateDate>2023-07-05T00:00:00</xmp:CreateDate>
<xmp:CreatorTool>synOCR 1.4.0</xmp:CreatorTool></rdf:Description>
<rdf:Description xmlns:xapMM="http://ns.adobe.com/xap/1.0/mm/" rdf:about="" xapMM:DocumentID="uuid:62258a05-5372-11f9-0000-5664f20a76f7"/>
<rdf:Description xmlns:dc="http://purl.org/dc/elements/1.1/" rdf:about="" dc:format="application/pdf"><dc:title><rdf:Alt><rdf:li xml:lang="x-default">Deg. 27. KW</rdf:li></rdf:Alt></dc:title><dc:creator><rdf:Seq><rdf:li>KOBILIN</rdf:li></rdf:Seq></dc:creator></rdf:Description>
<rdf:Description xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/" rdf:about="" pdfaid:part="2" pdfaid:conformance="B"/><rdf:Description xmlns:xmp="http://ns.adobe.com/xap/1.0/" rdf:about="" xmp:MetadataDate="2023-07-05T19:03:23.974217+02:00"/></rdf:RDF>
</x:xmpmeta>
2023-07-05 19:03:24,021 - INFO - save pdf to file (/tmp/tmp.oMs0RqPMX7/step2_tmp_1688576602/temp_2023.07.01 - testfile_1688576602.pdf_meta.pdf)
2023-07-05 19:03:24,051 - DEBUG - <<<<<< write meta_data ended
empty
0
[runtime up to now: 00:00:02]
target file: 2023.07.01 - testfile.pdf
-----------------------------------------------------------------------------------
| adjusts the attributes of the target file: |
-----------------------------------------------------------------------------------
➜ Adapt file date (Source: NOW)
➜ File permissions target file:
-rwxrwxrwx+ 1 synOCR synOCR 219453 Jul 5 19:03 /volume1/OCR/_OUTPUT/2023.07.01 - testfile.pdf
-----------------------------------------------------------------------------------
| final tasks: |
-----------------------------------------------------------------------------------
INFO: Notify for apprise not defined ...
run user defined post scripts:
Stats:
runtime last file: ➜ 00:00:05
pagecount last file: ➜ 1
file count profile : ➜ (profile default) - 191 PDF's / 634 Pages processed up to now
file count total: ➜ 350 PDF's / 1183 Pages processed up to now since 2019-06-04
cleanup:
delete tmp-files ...
removed '/tmp/tmp.oMs0RqPMX7/2023.07.01 - testfile.pdf'
removed '/tmp/tmp.oMs0RqPMX7/step2_tmp_1688576602/synOCR.txt'
removed '/tmp/tmp.oMs0RqPMX7/step2_tmp_1688576602/synOCR_filename.txt'
removed directory '/tmp/tmp.oMs0RqPMX7/step2_tmp_1688576602/'
removed directory '/tmp/tmp.oMs0RqPMX7'
purge log files ...
delete 1 log files ( > 10 files)
delete -9 search files ( > 10 files)
purge backup deactivated!
runtime all files: ➜ 00:01:54
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
● ---------------------------------- ●
● | ==> END OF FUNCTIONS <== | ●
● ---------------------------------- ●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
Nice 😃
synOCR write the current version to …/python3_env/synOCR_python_env_version Every time synOCR is started, the saved version is compared with the installation version and if there is a discrepancy, the Python environment is updated. For some reason this check does not seem to work reliably. But I have not found the error yet.
Running on Synology NAS with version DSM 7.1.1-42962 Update 6. Worked fine with version 1.3.3., updated today to 1.4.0, now it fails. Afterwards the document is not in input or output folder. luckily it is still in backup folder.
According to the log file everything seems normal till reaching step 2, see below: