Stirling-Tools / Stirling-PDF

#1 Locally hosted web application that allows you to perform various operations on PDF files
GNU General Public License v3.0
29.63k stars 2.17k forks source link

[Bug]: Problem using Cleanup Scans/OCR #1493

Open topoldo opened 1 week ago

topoldo commented 1 week ago

The Problem

I have a pdf test file which I attach here (test.pdf) test.pdf which I submitted to use Stirling-PDF Cleanup Scans/OCR function. In the attached file option.png you find the options I adopted. Option

Version of Stirling-PDF

0.26.1 - Docker on a Synology device

Last Working Version of Stirling-PDF

None tested before

Page Where the Problem Occurred

http://192.168.1.50:8080/ocr-pdf?lang=en_US [just to see the internal URL I used]

Docker Configuration

No response

Relevant Log Output

ERROR
---------
Internal Server Error:java.io.IOException: Command process failed with exit code 2.Error message: DEBUG ocrmypdf - ocrmypdf 16.1.1 DEBUG ocrmypdf.subprocess - Running: ['unpaper', '--version'] DEBUG ocrmypdf.subprocess - Found unpaper 7.0.0 DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--version'] DEBUG ocrmypdf.subprocess - Found tesseract 5.3.4 DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--version'] DEBUG ocrmypdf.subprocess - Running: ['gs', '--version'] DEBUG ocrmypdf.subprocess - Found gs 10.2.1 DEBUG ocrmypdf.subprocess - Running: ['gs', '--version'] DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--list-langs'] DEBUG ocrmypdf.subprocess.tesseract - stdout/stderr = [DS] Profile read from file (tesseract_opencl_profile_devices.dat). [DS] Device[1] 0:(null) score is 1.146822 [DS] Selected Device[1]: "(null)" (Native) List of available languages in "/usr/share/tessdata/" (3): eng ita lat DEBUG ocrmypdf.helpers - pikepdf mmap enabled DEBUG ocrmypdf.helpers - os.symlink(/tmp/input_1371985821737702192.pdf, /tmp/ocrmypdf.io.c_b14rry/origin) DEBUG ocrmypdf.helpers - os.symlink(/tmp/ocrmypdf.io.c_b14rry/origin, /tmp/ocrmypdf.io.c_b14rry/origin.pdf) DEBUG root - Gathering info with 1 thread workers DEBUG ocrmypdf.helpers - pikepdf mmap enabled ERROR ocrmypdf._pipelines._common - ExitCodeException Traceback (most recent call last): File "/usr/lib/python3.12/site-packages/ocrmypdf/_pipelines/_common.py", line 249, in cli_exception_handler return fn(options, plugin_manager) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/site-packages/ocrmypdf/_pipelines/ocr.py", line 188, in _run_pipeline validate_pdfinfo_options(context) File "/usr/lib/python3.12/site-packages/ocrmypdf/_pipeline.py", line 204, in validate_pdfinfo_options raise DigitalSignatureError() ocrmypdf.exceptions.DigitalSignatureError: Input PDF has a digital signature. OCR would alter the document, invalidating the signature. 

STACK TRACE
------------
java.io.IOException: Command process failed with exit code 2. Error message:   DEBUG ocrmypdf - ocrmypdf 16.1.1
  DEBUG ocrmypdf.subprocess - Running: ['unpaper', '--version']
  DEBUG ocrmypdf.subprocess - Found unpaper 7.0.0
  DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--version']
  DEBUG ocrmypdf.subprocess - Found tesseract 5.3.4
  DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--version']
  DEBUG ocrmypdf.subprocess - Running: ['gs', '--version']
  DEBUG ocrmypdf.subprocess - Found gs 10.2.1
  DEBUG ocrmypdf.subprocess - Running: ['gs', '--version']
  DEBUG ocrmypdf.subprocess - Running: ['tesseract', '--list-langs']
  DEBUG ocrmypdf.subprocess.tesseract - stdout/stderr = [DS] Profile read from file (tesseract_opencl_profile_devices.dat).
[DS] Device[1] 0:(null) score is 1.146822
[DS] Selected Device[1]: "(null)" (Native)
List of available languages in "/usr/share/tessdata/" (3):
eng
ita
lat

  DEBUG ocrmypdf.helpers - pikepdf mmap enabled
  DEBUG ocrmypdf.helpers - os.symlink(/tmp/input_1371985821737702192.pdf, /tmp/ocrmypdf.io.c_b14rry/origin)
  DEBUG ocrmypdf.helpers - os.symlink(/tmp/ocrmypdf.io.c_b14rry/origin, /tmp/ocrmypdf.io.c_b14rry/origin.pdf)
  DEBUG root - Gathering info with 1 thread workers
  DEBUG ocrmypdf.helpers - pikepdf mmap enabled

  ERROR ocrmypdf._pipelines._common - ExitCodeException
Traceback (most recent call last):
  File "/usr/lib/python3.12/site-packages/ocrmypdf/_pipelines/_common.py", line 249, in cli_exception_handler
    return fn(options, plugin_manager)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/ocrmypdf/_pipelines/ocr.py", line 188, in _run_pipeline
    validate_pdfinfo_options(context)
  File "/usr/lib/python3.12/site-packages/ocrmypdf/_pipeline.py", line 204, in validate_pdfinfo_options
    raise DigitalSignatureError()
ocrmypdf.exceptions.DigitalSignatureError: Input PDF has a digital signature. OCR would alter the document,
invalidating the signature.

    at stirling.software.SPDF.utils.ProcessExecutor.runCommandWithOutputHandling(ProcessExecutor.java:190)
    at stirling.software.SPDF.utils.ProcessExecutor.runCommandWithOutputHandling(ProcessExecutor.java:85)
    at stirling.software.SPDF.controller.api.misc.OCRController.processPdfWithOCR(OCRController.java:148)
    at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
    at java.base/java.lang.reflect.Method.invoke(Method.java:580)
    at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:255)
    at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:188)
    at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:118)
    at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:925)
    at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:830)
    at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:87)
    at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:1089)
    at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:979)
    at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:1014)
    at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:914)
    at jakarta.servlet.http.HttpServlet.service(HttpServlet.java:547)
    at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:885)
    at jakarta.servlet.http.HttpServlet.service(HttpServlet.java:614)
    at org.eclipse.jetty.ee10.servlet.ServletHolder.handle(ServletHolder.java:736)
    at org.eclipse.jetty.ee10.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1614)
    at org.eclipse.jetty.ee10.websocket.servlet.WebSocketUpgradeFilter.doFilter(WebSocketUpgradeFilter.java:195)
    at org.eclipse.jetty.ee10.servlet.FilterHolder.doFilter(FilterHolder.java:205)
    at org.eclipse.jetty.ee10.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1586)
    at stirling.software.SPDF.config.MetricsFilter.doFilterInternal(MetricsFilter.java:61)
    at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
    at org.eclipse.jetty.ee10.servlet.FilterHolder.doFilter(FilterHolder.java:205)
    at org.eclipse.jetty.ee10.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1586)
    at org.springframework.web.filter.RequestContextFilter.doFilterInternal(RequestContextFilter.java:100)
    at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
    at org.eclipse.jetty.ee10.servlet.FilterHolder.doFilter(FilterHolder.java:205)
    at org.eclipse.jetty.ee10.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1586)
    at org.springframework.web.filter.FormContentFilter.doFilterInternal(FormContentFilter.java:93)
    at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
    at org.eclipse.jetty.ee10.servlet.FilterHolder.doFilter(FilterHolder.java:205)
    at org.eclipse.jetty.ee10.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1586)
    at org.springframework.web.filter.ServerHttpObservationFilter.doFilterInternal(ServerHttpObservationFilter.java:109)
    at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
    at org.eclipse.jetty.ee10.servlet.FilterHolder.doFilter(FilterHolder.java:205)
    at org.eclipse.jetty.ee10.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1586)
    at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:201)
    at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
    at org.eclipse.jetty.ee10.servlet.FilterHolder.doFilter(FilterHolder.java:205)
    at org.eclipse.jetty.ee10.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1586)
    at org.eclipse.jetty.ee10.servlet.ServletHandler$MappedServlet.handle(ServletHandler.java:1547)
    at org.eclipse.jetty.ee10.servlet.ServletChannel.dispatch(ServletChannel.java:814)
    at org.eclipse.jetty.ee10.servlet.ServletChannel.handle(ServletChannel.java:431)
    at org.eclipse.jetty.ee10.servlet.ServletHandler.handle(ServletHandler.java:464)
    at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:571)
    at org.eclipse.jetty.ee10.servlet.SessionHandler.handle(SessionHandler.java:703)
    at org.eclipse.jetty.server.handler.ContextHandler.handle(ContextHandler.java:765)
    at org.eclipse.jetty.server.Server.handle(Server.java:179)
    at org.eclipse.jetty.server.internal.HttpChannelState$HandlerInvoker.run(HttpChannelState.java:619)
    at org.eclipse.jetty.server.internal.HttpConnection.onFillable(HttpConnection.java:411)
    at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:322)
    at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:99)
    at org.eclipse.jetty.io.SelectableChannelEndPoint$1.run(SelectableChannelEndPoint.java:53)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:971)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.doRunJob(QueuedThreadPool.java:1201)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1156)
    at java.base/java.lang.Thread.run(Thread.java:1583)

Additional Information

No response

Browsers Affected

No response

No Duplicate of the Issue

Frooodle commented 1 week ago

issue is

Input PDF has a digital signature. OCR would alter the document,
invalidating the signature.

looks like a duplicate of issue resolved for PDFA conversion https://github.com/Stirling-Tools/Stirling-PDF/pull/1360

Will add same solution, thanks for raising