ArtifexSoftware / pdf2docx

Open source Python library for converting PDF to DOCX.
https://pdf2docx.readthedocs.io
GNU Affero General Public License v3.0
2.55k stars 374 forks source link

Error faced when running pdf2docx; please kindly assist! #203

Open richylyq opened 1 year ago

richylyq commented 1 year ago

Hi Devs,

I am seeing this error right now, for my conversion of PDF to docx

[INFO] Start to convert C:\Users\Richmond\Desktop\work\MideaInstallerFiles\Setup Guides\OTP Setup Guide.pdf
[INFO] [1/4] Opening document...
[INFO] [2/4] Analyzing document...
Traceback (most recent call last):
  File "C:\Users\Richmond\Desktop\personal\codes\day2daytask\pdfstuff\pdftools.py", line 94, in <module>
    func_dict[args.functions](convert2pdf, savepath)
  File "C:\Users\Richmond\Desktop\personal\codes\day2daytask\pdfstuff\pdftools.py", line 38, in pdftodocx
    cv.convert(savelocation)  # Converts all pages by default
  File "C:\Users\Richmond\AppData\Local\Programs\Python\Python310\lib\site-packages\pdf2docx\converter.py", line 329, in convert
    self.parse(start, end, pages, **settings).make_docx(docx_filename, **settings)
  File "C:\Users\Richmond\AppData\Local\Programs\Python\Python310\lib\site-packages\pdf2docx\converter.py", line 112, in parse 
    return self.load_pages(start, end, pages) \
  File "C:\Users\Richmond\AppData\Local\Programs\Python\Python310\lib\site-packages\pdf2docx\converter.py", line 153, in parse_document
    self._pages.parse(self.fitz_doc, **kwargs)
  File "C:\Users\Richmond\AppData\Local\Programs\Python\Python310\lib\site-packages\pdf2docx\page\Pages.py", line 37, in parse 
    raw_page.restore(**settings)
  File "C:\Users\Richmond\AppData\Local\Programs\Python\Python310\lib\site-packages\pdf2docx\common\share.py", line 226, in inner
    objects = func(*args, **kwargs)
  File "C:\Users\Richmond\AppData\Local\Programs\Python\Python310\lib\site-packages\pdf2docx\page\RawPage.py", line 66, in restore
    raw_dict = self.extract_raw_dict(**settings)
  File "C:\Users\Richmond\AppData\Local\Programs\Python\Python310\lib\site-packages\pdf2docx\page\RawPageFitz.py", line 36, in extract_raw_dict
    shapes, images =  self._preprocess_shapes(**settings)
  File "C:\Users\Richmond\AppData\Local\Programs\Python\Python310\lib\site-packages\pdf2docx\page\RawPageFitz.py", line 124, in _preprocess_shapes
    return paths.to_shapes_and_images(
  File "C:\Users\Richmond\AppData\Local\Programs\Python\Python310\lib\site-packages\pdf2docx\shape\Paths.py", line 97, in to_shapes_and_images
    iso_shapes.extend(self.to_shapes())
  File "C:\Users\Richmond\AppData\Local\Programs\Python\Python310\lib\site-packages\pdf2docx\shape\Paths.py", line 72, in to_shapes
    shapes.extend(path.to_shapes())
  File "C:\Users\Richmond\AppData\Local\Programs\Python\Python310\lib\site-packages\pdf2docx\shape\Path.py", line 338, in to_shapes
    iso_shapes.extend(self._to_fills(fill_color))
  File "C:\Users\Richmond\AppData\Local\Programs\Python\Python310\lib\site-packages\pdf2docx\shape\Path.py", line 366, in _to_fills
    fills.append(segments.to_fill(color))
  File "C:\Users\Richmond\AppData\Local\Programs\Python\Python310\lib\site-packages\pdf2docx\shape\Path.py", line 228, in to_fill
    'color': rgb_value(color)
  File "C:\Users\Richmond\AppData\Local\Programs\Python\Python310\lib\site-packages\pdf2docx\common\share.py", line 170, in rgb_value
    num = len(components)
TypeError: object of type 'NoneType' has no len()

more info of the document; consist of images inside PDF which I think is the reason for the above error. wait.... are images not supported?

Awaiting for assistance which I hope comes soon HAHA

my pdf2docx version is as follows pdf2docx==0.5.6

dothinking commented 9 months ago

Awaiting for assistance which I hope comes soon HAHA

Sorry for the late reply coming half of a year later. If you're still working for this topic, a test file will be much appreciated. 🤣 Or, refer to #202 for a workaround.