ArtifexSoftware / pdf2docx

Open source Python library for converting PDF to DOCX.
https://pdf2docx.readthedocs.io
GNU Affero General Public License v3.0
2.46k stars 356 forks source link

Missing separators when converting pdf to docx #257

Open wwaguai opened 7 months ago

wwaguai commented 7 months ago

Hello,

I have noticed that when converting pdf files to docx using the pdf2docx library, the resulting docx file is missing the separators. Specifically, the lines that separate different sections or paragraphs in the PDF are not preserved in the converted document.

I would like to know if there is a way to address this issue and ensure that the separators are retained during the conversion process. For example, I have attached a sample PDF file where this problem occurs.

Any guidance or assistance on resolving this matter would be greatly appreciated.

Thank you! test_0122.pdf

dothinking commented 7 months ago

Thanks for providing test file. This is a planned feature (straight line), but unfortunately, it is not supported yet, and might take some time.

richa27gpt commented 7 months ago

I have the same request.