AllanCameron / PDFR

An R package to extract text from pdf.
Other
37 stars 3 forks source link

Get encodings from type 1 fonts #2

Closed AllanCameron closed 5 years ago

AllanCameron commented 5 years ago

Some fonts have no encodings specified except in the font program (e.g. some book-form pdfs from Project Gutenburg). Although a chunk of the file program is binary, the header contains a text-form encoding map that may be used for ligatures etc.

AllanCameron commented 5 years ago

Now implemented