ArtifexSoftware / pdf2docx

Open source Python library for converting PDF to DOCX.
https://pdf2docx.readthedocs.io
GNU Affero General Public License v3.0
2.46k stars 356 forks source link

ValueError: unsupported colorspace for 'png' #284

Closed bikerr closed 1 week ago

bikerr commented 4 months ago
File "/usr/local/lib/python3.7/site-packages/pdf2docx/page/RawPage.py", line 67, in restore
    raw_dict = self.extract_raw_dict(**settings)
  File "/usr/local/lib/python3.7/site-packages/pdf2docx/page/RawPageFitz.py", line 33, in extract_raw_dict
    image_blocks = self._preprocess_images(**settings)
  File "/usr/local/lib/python3.7/site-packages/pdf2docx/page/RawPageFitz.py", line 118, in _preprocess_images
    return ImagesExtractor(self.page_engine).extract_images(settings['clip_image_res_ratio'])
  File "/usr/local/lib/python3.7/site-packages/pdf2docx/image/ImagesExtractor.py", line 159, in extract_images
    raw_dict = self._to_raw_dict(pix, bbox)
  File "/usr/local/lib/python3.7/site-packages/pdf2docx/image/ImagesExtractor.py", line 239, in _to_raw_dict
    'image': image.tobytes()
  File "/usr/local/lib/python3.7/site-packages/fitz/fitz.py", line 7300, in tobytes
    raise ValueError("unsupported colorspace for '%s'" % output)
ValueError: unsupported colorspace for 'png'
huberemanuel commented 2 months ago

A quick fix is to change 'image': image.tobytes() to 'image': image.tobytes('jpg')

tengmao commented 1 month ago

i change version to v.0.5.6, it worked!

greendreamer commented 1 week ago

Closing this for lack of reaction for an extended amount of time. Feel free to open a new issue - however please with a reproducing example.