ArtifexSoftware / pdf2docx

Open source Python library for converting PDF to DOCX.
https://pdf2docx.readthedocs.io
GNU Affero General Public License v3.0
2.63k stars 382 forks source link

Some links disappear when converting #315

Open Adamkadaban opened 2 months ago

Adamkadaban commented 2 months ago

Description of the bug

Links in document seem to disappear when converting.

Screenshot from source pdf: image

Screenshot from conversion with Adobe converter: image

Screenshot from pdf2docx conversion: image

How to reproduce the bug

Installation with pipx install pdf2docx

File is attached: CPTC_9_Report.pdf

Command run: pdf2docx convert CPTC_9_Report.pdf --pages 104,105

[INFO] Start to convert CPTC_9_Report.pdf
[INFO] [1/4] Opening document...
[INFO] [2/4] Analyzing document...
[INFO] [3/4] Parsing pages...
[INFO] (1/2) Page 105
[INFO] (2/2) Page 106
[INFO] [4/4] Creating pages...
[INFO] (1/2) Page 105
[INFO] (2/2) Page 106
[INFO] Terminated in 0.80s.

pipx list | grep pdf2docx

   package pdf2docx 0.5.8, installed using Python 3.11.2
    - pdf2docx

pdf2docx version

0.5.8

Operating system

Linux

Python version

3.11 3.12

Adamkadaban commented 2 months ago

@greendreamer

pdf2docx is supported on several versions and tested in v3.11 and v3.12 too. Please use later versions of Python.

When installing with pip on python3.12, the same issues occur

wlevene commented 1 month ago

pdf2document.com - No loss of pdf layout 30 free page conversions daily for all users.