PDF scientific paper translation and bilingual comparison based on font rules and deep learning, preserving formula and figure layout.
pip install pdf2zh
Execute the translation command in the command line to generate the translated document example-zh.pdf
and the bilingual document example-dual.pdf
in the current directory.
pdf2zh example.pdf
pdf2zh example.pdf -p 1-3,5
pdf2zh example.pdf -li en -lo ja
Hint: Starting from \ufb00
is English style ligature.
pdf2zh BDA3.pdf -f "(CM[^RT].*|MS.*|XY.*|MT.*|BL.*|.*0700|.*0500|.*Italic)" -c "(\(|\||\)|\+|=|\d|[\u0080-\ufaff])"
Document merging: PyMuPDF
Document parsing: Pdfminer.six
Multi-threaded translation: MathTranslate
Layout parsing: LayoutParser