ArtifexSoftware / pdf2docx

Open source Python library for converting PDF to DOCX.
https://pdf2docx.readthedocs.io
GNU Affero General Public License v3.0
2.46k stars 356 forks source link

设置multi_processing=True,在Linux上会程序卡死 #266

Closed sunny6chen closed 4 months ago

sunny6chen commented 7 months ago
image

如图解析完pdf的页面,整个程序就卡死了。

Linux版本:Linux version 3.10.0-1160.95.1.el7.x86_64 (mockbuild@kbuilder.bsys.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC) ) #1 SMP Mon Jul 24 13:59:37 UTC 2023

sunny6chen commented 7 months ago

multi_processing模式会往本地写page.json ,如果有多个程序同时执行,是不是会解析错误。

tajinshi commented 5 months ago

导包问题,你改下源码就好了

sunny6chen commented 4 months ago

multi_processing模式会往本地写page.json ,如果有多个程序同时执行,是不是会解析错误。

重写了方法:_convert_with_multi_processing

解决多个PDF同时转换问题:prefix = f'pages_{uuid.uuid4().hex}'