Stirling-Tools / Stirling-PDF

#1 Locally hosted web application that allows you to perform various operations on PDF files
https://stirlingpdf.com
MIT License
45.92k stars 3.74k forks source link

[Bug]: Error when converting from PDF to PDF/A #1638

Closed bjoern-vh closed 1 month ago

bjoern-vh commented 3 months ago

The Problem

Trying to convert an ordinary PDF file to PDF/A, I got the following error message after a short time:

...in getcolor raise ValueError(msg) ValueError: cannot add non-opaque RGBA color to RGB palette

I attached the error message and the stack trace in the relevant log output.

If you need more information, I will try to help.

Version of Stirling-PDF

0.26.1

Last Working Version of Stirling-PDF

No response

Page Where the Problem Occurred

https://pdf.mydomain/pdf-to-pdfa

Docker Configuration

No response

Relevant Log Output

Error message:

Internal Server Error:java.io.IOException: Command process failed with exit code 15. Error message: An exception occurred while executing the pipeline Traceback (most recent call last): File "/usr/lib/python3.12/site-packages/ocrmypdf/_pipelines/_common.py", line 249, in cli_exception_handler return fn(options, plugin_manager) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/site-packages/ocrmypdf/_pipelines/ocr.py", line 175, in _run_pipeline pdfinfo = get_pdfinfo( ^^^^^^^^^^^^ File "/usr/lib/python3.12/site-packages/ocrmypdf/_pipeline.py", line 175, in get_pdfinfo return PdfInfo( ^^^^^^^^ File "/usr/lib/python3.12/site-packages/ocrmypdf/pdfinfo/info.py", line 1118, in __init__ self._pages = _pdf_pageinfo_concurrent( ^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/site-packages/ocrmypdf/pdfinfo/info.py", line 777, in _pdf_pageinfo_concurrent executor( File "/usr/lib/python3.12/site-packages/ocrmypdf/_concurrent.py", line 78, in __call__ self._execute( File "/usr/lib/python3.12/site-packages/ocrmypdf/builtin_plugins/concurrency.py", line 144, in _execute result = future.result() ^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result return self.__get_result() ^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result raise self._exception File "/usr/lib/python3.12/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/site-packages/ocrmypdf/pdfinfo/info.py", line 726, in _pdf_pageinfo_sync return PageInfo(pdf, pageno, infile, check_pages, detailed_analysis) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/site-packages/ocrmypdf/pdfinfo/info.py", line 841, in __init__ self._gather_pageinfo(pdf, pageno, infile, check_pages, detailed_analysis) File "/usr/lib/python3.12/site-packages/ocrmypdf/pdfinfo/info.py", line 892, in _gather_pageinfo for info in _process_content_streams( File "/usr/lib/python3.12/site-packages/ocrmypdf/pdfinfo/info.py", line 638, in _process_content_streams yield from _find_form_xobject_images(pdf, container, contentsinfo) File "/usr/lib/python3.12/site-packages/ocrmypdf/pdfinfo/info.py", line 583, in _find_form_xobject_images yield from _process_content_streams( File "/usr/lib/python3.12/site-packages/ocrmypdf/pdfinfo/info.py", line 638, in _process_content_streams yield from _find_form_xobject_images(pdf, container, contentsinfo) File "/usr/lib/python3.12/site-packages/ocrmypdf/pdfinfo/info.py", line 583, in _find_form_xobject_images yield from _process_content_streams( File "/usr/lib/python3.12/site-packages/ocrmypdf/pdfinfo/info.py", line 637, in _process_content_streams yield from _find_regular_images(container, contentsinfo) File "/usr/lib/python3.12/site-packages/ocrmypdf/pdfinfo/info.py", line 553, in _find_regular_images yield ImageInfo(name=draw.name, pdfimage=pdfimage, shorthand=draw.shorthand) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/site-packages/ocrmypdf/pdfinfo/info.py", line 363, in __init__ pim = PdfImage(pdfimage) ^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/site-packages/pikepdf/models/image.py", line 831, in __init__ self._jpxpil = self.as_pil_image() ^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/site-packages/pikepdf/models/image.py", line 740, in as_pil_image return Image.open(bio) ^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/site-packages/PIL/Image.py", line 3318, in open im = _open_core(fp, filename, prefix, formats) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/site-packages/PIL/Image.py", line 3304, in _open_core im = factory(fp, filename) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/site-packages/PIL/ImageFile.py", line 137, in __init__ self._open() File "/usr/lib/python3.12/site-packages/PIL/Jpeg2KImagePlugin.py", line 224, in _open header = _parse_jp2_header(self.fp) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/lib/python3.12/site-packages/PIL/Jpeg2KImagePlugin.py", line 185, in _parse_jp2_header palette.getcolor(header.read_fields(">" + ("B" * npc))) File "/usr/lib/python3.12/site-packages/PIL/ImagePalette.py", line 144, in getcolor raise ValueError(msg) ValueError: cannot add non-opaque RGBA color to RGB palette

-----------------------------------------------------------------------------------------

StackTrace:

java.io.IOException: Command process failed with exit code 15. Error message: 
An exception occurred while executing the pipeline
Traceback (most recent call last):
  File "/usr/lib/python3.12/site-packages/ocrmypdf/_pipelines/_common.py", line 249, in cli_exception_handler
    return fn(options, plugin_manager)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/ocrmypdf/_pipelines/ocr.py", line 175, in _run_pipeline
    pdfinfo = get_pdfinfo(
              ^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/ocrmypdf/_pipeline.py", line 175, in get_pdfinfo
    return PdfInfo(
           ^^^^^^^^
  File "/usr/lib/python3.12/site-packages/ocrmypdf/pdfinfo/info.py", line 1118, in __init__
    self._pages = _pdf_pageinfo_concurrent(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/ocrmypdf/pdfinfo/info.py", line 777, in _pdf_pageinfo_concurrent
    executor(
  File "/usr/lib/python3.12/site-packages/ocrmypdf/_concurrent.py", line 78, in __call__
    self._execute(
  File "/usr/lib/python3.12/site-packages/ocrmypdf/builtin_plugins/concurrency.py", line 144, in _execute
    result = future.result()
             ^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File "/usr/lib/python3.12/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/ocrmypdf/pdfinfo/info.py", line 726, in _pdf_pageinfo_sync
    return PageInfo(pdf, pageno, infile, check_pages, detailed_analysis)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/ocrmypdf/pdfinfo/info.py", line 841, in __init__
    self._gather_pageinfo(pdf, pageno, infile, check_pages, detailed_analysis)
  File "/usr/lib/python3.12/site-packages/ocrmypdf/pdfinfo/info.py", line 892, in _gather_pageinfo
    for info in _process_content_streams(
  File "/usr/lib/python3.12/site-packages/ocrmypdf/pdfinfo/info.py", line 638, in _process_content_streams
    yield from _find_form_xobject_images(pdf, container, contentsinfo)
  File "/usr/lib/python3.12/site-packages/ocrmypdf/pdfinfo/info.py", line 583, in _find_form_xobject_images
    yield from _process_content_streams(
  File "/usr/lib/python3.12/site-packages/ocrmypdf/pdfinfo/info.py", line 638, in _process_content_streams
    yield from _find_form_xobject_images(pdf, container, contentsinfo)
  File "/usr/lib/python3.12/site-packages/ocrmypdf/pdfinfo/info.py", line 583, in _find_form_xobject_images
    yield from _process_content_streams(
  File "/usr/lib/python3.12/site-packages/ocrmypdf/pdfinfo/info.py", line 637, in _process_content_streams
    yield from _find_regular_images(container, contentsinfo)
  File "/usr/lib/python3.12/site-packages/ocrmypdf/pdfinfo/info.py", line 553, in _find_regular_images
    yield ImageInfo(name=draw.name, pdfimage=pdfimage, shorthand=draw.shorthand)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/ocrmypdf/pdfinfo/info.py", line 363, in __init__
    pim = PdfImage(pdfimage)
          ^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/pikepdf/models/image.py", line 831, in __init__
    self._jpxpil = self.as_pil_image()
                   ^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/pikepdf/models/image.py", line 740, in as_pil_image
    return Image.open(bio)
           ^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/PIL/Image.py", line 3318, in open
    im = _open_core(fp, filename, prefix, formats)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/PIL/Image.py", line 3304, in _open_core
    im = factory(fp, filename)
         ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/PIL/ImageFile.py", line 137, in __init__
    self._open()
  File "/usr/lib/python3.12/site-packages/PIL/Jpeg2KImagePlugin.py", line 224, in _open
    header = _parse_jp2_header(self.fp)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/site-packages/PIL/Jpeg2KImagePlugin.py", line 185, in _parse_jp2_header
    palette.getcolor(header.read_fields(">" + ("B" * npc)))
  File "/usr/lib/python3.12/site-packages/PIL/ImagePalette.py", line 144, in getcolor
    raise ValueError(msg)
ValueError: cannot add non-opaque RGBA color to RGB palette
    at stirling.software.SPDF.utils.ProcessExecutor.runCommandWithOutputHandling(ProcessExecutor.java:190)
    at stirling.software.SPDF.utils.ProcessExecutor.runCommandWithOutputHandling(ProcessExecutor.java:85)
    at stirling.software.SPDF.controller.api.converters.ConvertPDFToPDFA.pdfToPdfA(ConvertPDFToPDFA.java:102)
    at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
    at java.base/java.lang.reflect.Method.invoke(Method.java:580)
    at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:255)
    at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:188)
    at org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:118)
    at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:925)
    at org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:830)
    at org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:87)
    at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:1089)
    at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:979)
    at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:1014)
    at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:914)
    at jakarta.servlet.http.HttpServlet.service(HttpServlet.java:547)
    at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:885)
    at jakarta.servlet.http.HttpServlet.service(HttpServlet.java:614)
    at org.eclipse.jetty.ee10.servlet.ServletHolder.handle(ServletHolder.java:736)
    at org.eclipse.jetty.ee10.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1614)
    at org.eclipse.jetty.ee10.websocket.servlet.WebSocketUpgradeFilter.doFilter(WebSocketUpgradeFilter.java:195)
    at org.eclipse.jetty.ee10.servlet.FilterHolder.doFilter(FilterHolder.java:205)
    at org.eclipse.jetty.ee10.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1586)
    at stirling.software.SPDF.config.MetricsFilter.doFilterInternal(MetricsFilter.java:61)
    at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
    at org.eclipse.jetty.ee10.servlet.FilterHolder.doFilter(FilterHolder.java:205)
    at org.eclipse.jetty.ee10.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1586)
    at org.springframework.web.filter.RequestContextFilter.doFilterInternal(RequestContextFilter.java:100)
    at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
    at org.eclipse.jetty.ee10.servlet.FilterHolder.doFilter(FilterHolder.java:205)
    at org.eclipse.jetty.ee10.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1586)
    at org.springframework.web.filter.FormContentFilter.doFilterInternal(FormContentFilter.java:93)
    at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
    at org.eclipse.jetty.ee10.servlet.FilterHolder.doFilter(FilterHolder.java:205)
    at org.eclipse.jetty.ee10.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1586)
    at org.springframework.web.filter.ServerHttpObservationFilter.doFilterInternal(ServerHttpObservationFilter.java:109)
    at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
    at org.eclipse.jetty.ee10.servlet.FilterHolder.doFilter(FilterHolder.java:205)
    at org.eclipse.jetty.ee10.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1586)
    at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:201)
    at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:116)
    at org.eclipse.jetty.ee10.servlet.FilterHolder.doFilter(FilterHolder.java:205)
    at org.eclipse.jetty.ee10.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1586)
    at org.eclipse.jetty.ee10.servlet.ServletHandler$MappedServlet.handle(ServletHandler.java:1547)
    at org.eclipse.jetty.ee10.servlet.ServletChannel.dispatch(ServletChannel.java:814)
    at org.eclipse.jetty.ee10.servlet.ServletChannel.handle(ServletChannel.java:431)
    at org.eclipse.jetty.ee10.servlet.ServletHandler.handle(ServletHandler.java:464)
    at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:571)
    at org.eclipse.jetty.ee10.servlet.SessionHandler.handle(SessionHandler.java:703)
    at org.eclipse.jetty.server.handler.ContextHandler.handle(ContextHandler.java:765)
    at org.eclipse.jetty.server.Server.handle(Server.java:179)
    at org.eclipse.jetty.server.internal.HttpChannelState$HandlerInvoker.run(HttpChannelState.java:619)
    at org.eclipse.jetty.server.internal.HttpConnection.onFillable(HttpConnection.java:411)
    at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:322)
    at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:99)
    at org.eclipse.jetty.io.SelectableChannelEndPoint$1.run(SelectableChannelEndPoint.java:53)
    at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.runTask(AdaptiveExecutionStrategy.java:478)
    at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.consumeTask(AdaptiveExecutionStrategy.java:441)
    at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.tryProduce(AdaptiveExecutionStrategy.java:293)
    at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.run(AdaptiveExecutionStrategy.java:201)
    at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:410)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:971)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.doRunJob(QueuedThreadPool.java:1201)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1156)
    at java.base/java.lang.Thread.run(Thread.java:1583)

Additional Information

No response

Browsers Affected

No response

No Duplicate of the Issue

sbplat commented 1 month ago

Hey @bjoern-vh, could you please share a reproducible example so we can take a look at it?

bjoern-vh commented 1 month ago

Yes, of course. Here a single test page that throws that error message.

test.pdf

Thanks a lot in advance :+1:

sbplat commented 1 month ago

Here's the converted version of your pdf in case you need it. I'll push a fix for it now. test_PDFA.pdf

Frooodle commented 1 month ago

Will do another release tonight or tomorrow morning with this fix