ArtifexSoftware / pdf2docx

Open source Python library for converting PDF to DOCX.
https://pdf2docx.readthedocs.io
GNU Affero General Public License v3.0
2.46k stars 356 forks source link

转word后图片被旋转180° #297

Open liuts-plane opened 2 months ago

liuts-plane commented 2 months ago

您好,在提出这个问题之前,我已经看过了其他issue,比如:#110

问题描述: 我使用command将pdf转成word时,所有的图片发生了180°的旋转,使用的命令:pdf2docx convert ./iso66.pdf iso66.docx

问题pdf: iso66.pdf

我的本地版本: pdfdocx截图

非常感谢您百忙之中的解答!

heweisheng commented 2 months ago

看看 #292 的fix方案能用不

liuts-plane commented 2 months ago

谢谢回复!!!

result: 还是旋转的180°,我的流程如下:

pip3 uninstall pdf2docx
pip3 install git+https://github.com/ArtifexSoftware/pdf2docx.git@jules

根据历史issue,这个问题看起来还很难解决,您有空了再说吧

heweisheng commented 2 months ago

谢谢回复!!!

result: 还是旋转的180°,我的流程如下:

pip3 uninstall pdf2docx
pip3 install git+https://github.com/ArtifexSoftware/pdf2docx.git@jules

根据历史issue,这个问题看起来还很难解决,您有空了再说吧

主分支还没有合并提交 我的修改不是在主分支这里

liuts-plane commented 2 months ago

尝试:pip3 install git+https://github.com/ArtifexSoftware/pdf2docx.git@98e92cb40c3f45dba7abd18f1cbac4171ee40e0c

安装失败了:


Cloning https://github.com/ArtifexSoftware/pdf2docx.git (to revision 98e92cb40c3f45dba7abd18f1cbac4171ee40e0c) to /tmp/pip-req-build-w2enmgk0
Running command git clone --filter=blob:none --quiet https://github.com/ArtifexSoftware/pdf2docx.git /tmp/pip-req-build-w2enmgk0
Running command git rev-parse -q --verify 'sha^98e92cb40c3f45dba7abd18f1cbac4171ee40e0c'
Running command git fetch -q https://github.com/ArtifexSoftware/pdf2docx.git 98e92cb40c3f45dba7abd18f1cbac4171ee40e0c
Running command git checkout -q 98e92cb40c3f45dba7abd18f1cbac4171ee40e0c
Resolved https://github.com/ArtifexSoftware/pdf2docx.git to commit 98e92cb40c3f45dba7abd18f1cbac4171ee40e0c
Installing build dependencies ... done
Getting requirements to build wheel ... error
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully. │ exit code: 1 ╰─> [27 lines of output] Traceback (most recent call last): File "", line 33, in load_requirements ModuleNotFoundError: No module named 'pip'

  During handling of the above exception, another exception occurred:

  Traceback (most recent call last):
    File "/usr/local/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
      main()
    File "/usr/local/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
      json_out['return_val'] = hook(**hook_input['kwargs'])
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel
      return hook(config_settings)
             ^^^^^^^^^^^^^^^^^^^^^
    File "/tmp/pip-build-env-hv1a70__/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 325, in get_requires_for_build_wheel
      return self._get_build_requires(config_settings, requirements=['wheel'])
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/tmp/pip-build-env-hv1a70__/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 295, in _get_build_requires
      self.run_setup()
    File "/tmp/pip-build-env-hv1a70__/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 487, in run_setup
      super().run_setup(setup_script=setup_script)
    File "/tmp/pip-build-env-hv1a70__/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 311, in run_setup
      exec(code, locals())
    File "<string>", line 60, in <module>
    File "<string>", line 36, in load_requirements
  ModuleNotFoundError: No module named 'pip'
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip. error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully. │ exit code: 1 ╰─> See above for output.


为了确保pip没问题,然后我又尝试:pip install git+https://github.com/ArtifexSoftware/pdf2docx.git@98e92cb40c3f45dba7abd18f1cbac4171ee40e0c
报同样的错 

为了确保不是我环境问题,我又安装了主分支:pip3 install git+https://github.com/ArtifexSoftware/pdf2docx.git
安装成功
heweisheng commented 2 months ago

尝试:pip3 install git+https://github.com/ArtifexSoftware/pdf2docx.git@98e92cb40c3f45dba7abd18f1cbac4171ee40e0c

安装失败了:

  Cloning https://github.com/ArtifexSoftware/pdf2docx.git (to revision 98e92cb40c3f45dba7abd18f1cbac4171ee40e0c) to /tmp/pip-req-build-w2enmgk0
  Running command git clone --filter=blob:none --quiet https://github.com/ArtifexSoftware/pdf2docx.git /tmp/pip-req-build-w2enmgk0
  Running command git rev-parse -q --verify 'sha^98e92cb40c3f45dba7abd18f1cbac4171ee40e0c'
  Running command git fetch -q https://github.com/ArtifexSoftware/pdf2docx.git 98e92cb40c3f45dba7abd18f1cbac4171ee40e0c
  Running command git checkout -q 98e92cb40c3f45dba7abd18f1cbac4171ee40e0c
  Resolved https://github.com/ArtifexSoftware/pdf2docx.git to commit 98e92cb40c3f45dba7abd18f1cbac4171ee40e0c
  Installing build dependencies ... done
  Getting requirements to build wheel ... error
  error: subprocess-exited-with-error

  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> [27 lines of output]
      Traceback (most recent call last):
        File "<string>", line 33, in load_requirements
      ModuleNotFoundError: No module named 'pip'

      During handling of the above exception, another exception occurred:

      Traceback (most recent call last):
        File "/usr/local/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
          main()
        File "/usr/local/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/usr/local/lib/python3.11/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel
          return hook(config_settings)
                 ^^^^^^^^^^^^^^^^^^^^^
        File "/tmp/pip-build-env-hv1a70__/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 325, in get_requires_for_build_wheel
          return self._get_build_requires(config_settings, requirements=['wheel'])
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        File "/tmp/pip-build-env-hv1a70__/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 295, in _get_build_requires
          self.run_setup()
        File "/tmp/pip-build-env-hv1a70__/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 487, in run_setup
          super().run_setup(setup_script=setup_script)
        File "/tmp/pip-build-env-hv1a70__/overlay/lib/python3.11/site-packages/setuptools/build_meta.py", line 311, in run_setup
          exec(code, locals())
        File "<string>", line 60, in <module>
        File "<string>", line 36, in load_requirements
      ModuleNotFoundError: No module named 'pip'
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

为了确保pip没问题,然后我又尝试:pip install git+https://github.com/ArtifexSoftware/pdf2docx.git@98e92cb40c3f45dba7abd18f1cbac4171ee40e0c 报同样的错

为了确保不是我环境问题,我又安装了主分支:pip3 install git+https://github.com/ArtifexSoftware/pdf2docx.git 安装成功

我改的东西也不多吧...你直接安装完后在库里改掉看看不就ok了吗,而且提交还没有被合并,为啥你从主库拉呢

liuts-plane commented 2 months ago

谢谢

非常有效!!!

heweisheng commented 2 months ago

谢谢

非常有效!!!

我的方案不是很好,但是矩阵变换玩不通,另外源码还有问题就是png会丢失透明度,还需要处理

yanhuixie commented 2 months ago

@heweisheng 多谢兄弟,我也遇到一样的问题,不过不是所有图片,而是部分图片被旋转90°,使用你的PR可以解决。

heweisheng commented 2 months ago

@heweisheng 多谢兄弟,我也遇到一样的问题,不过不是所有图片,而是部分图片被旋转90°,使用你的PR可以解决。

我现在发现还有些细节要处理下,有个印章的旋转不是90倍数的,这个方案有问题,我还在修改中

heweisheng commented 2 months ago

298 完善了矩阵变换的流程,你们可以测试下有问题不,我这里把印章的效果解决了,修复了png丢失透明度的bug,自测了下所有的exif情况没有什么问题,虽然逆矩阵的原因没有弄明白,但是能跑出预期的数据了 @yanhuixie @liuts-plane

yanhuixie commented 2 months ago

298 完善了矩阵变换的流程,你们可以测试下有问题不,我这里把印章的效果解决了,修复了png丢失透明度的bug,自测了下所有的exif情况没有什么问题,虽然逆矩阵的原因没有弄明白,但是能跑出预期的数据了 @yanhuixie @liuts-plane

多谢兄弟,赞神速~ 我这边的文档比较简单,您上一个版本就解决问题了。 我这边在尝试pdf2docx,可能不会选用,效果目前还满足不了,主要是在文本流布局和表格的解析方面。

JorjMcKie commented 1 week ago

Images may internally contain rotation (EXIF information). We need to check the image insertion matrix and appropriately reverse this action before outputting to word.