Byaidu / PDFMathTranslate

PDF scientific paper translation with preserved formats - 基于 AI 完整保留排版的 PDF 文档全文双语翻译,支持 Google/DeepL/Ollama/OpenAI 等服务,提供 CLI/GUI/Docker
https://pdf2zh.com
GNU Affero General Public License v3.0
3.07k stars 219 forks source link

bug (main): non-pdf/a not translated #102

Closed Andy-AO closed 2 days ago

Andy-AO commented 3 days ago

问题描述

pdf2zh document.pdf -li en -lo zh输出了document-zh.pdfdocument-dual.pdf,但是里面没有中文翻译

平台 Windows 10,py 3.11.5

(pdf2zh_ri) D:\software\green\pdf2zh_ri> pdf2zh document.pdf -li en -lo zh
D:\software\green\pdf2zh_ri\Lib\site-packages\doclayout_yolo\nn\tasks.py:733: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  ckpt = torch.load(file, map_location="cpu")
100%|████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:33<00:00,  4.77s/it]

测试文档

document.pdf document-zh.pdf document-dual.pdf

reycn commented 3 days ago

Same issue due to PDF/A format & feature scheduled @#101.

For now, please convert the file into PDF/A so you can get a translated document.

Replication note:

image
reycn commented 2 days ago

Bug fixed @feat: pdf/a auto covnerter

image