coherentgraphics / cpdf-binaries

PDF Command Line Tools binaries for Linux, Mac, Windows
GNU Affero General Public License v3.0
593 stars 42 forks source link

Font replacement support #28

Closed saginadir closed 5 years ago

saginadir commented 6 years ago

Hi guys :-) 👍 on the project!

I have a need of replacing fonts in existing PDF files. I want to replace all the fonts in a file to be "OpenDyslexia" instead of whatever the PDF authors have decided.

Before I start developing something like that I am searching if something already exists, could anyone who reads this point me to the right place? Otherwise I'll just roll-up my sleeves and develop it.

johnwhitington commented 6 years ago

Short answer: it's very hard. You have to deal with font encodings, special PDF-related font descriptors, differing font metrics etc.

The most likely candidate is to build on something like this:

https://www.pdflib.com/products/tet/

(Or, convert, or even OCR the files into another format, like Word, and change the font. Not as outlandish as it sounds, actually)

saginadir commented 6 years ago

Yeah I will definitely look at it.

The arXiv-Vanity project converts arXiv articles into web pages and HTML, they have their own way of decoding the PDF files, although they are claiming to work with LaTeX, but actually in PDF format, So I am not sure how that works yet, I'll have to look at their source code and see how they do it.

But I think my end-goal would be to do what arXiv-Vanity does, but instead of converting to web pages, just converting back to a PDF with a different font and maybe structure.