boazsegev / combine_pdf

A Pure ruby library to merge PDF files, number pages and maybe more...
MIT License
734 stars 156 forks source link

Special symbols in metadata #188

Closed d-kononov closed 3 years ago

d-kononov commented 3 years ago

Hello!

Is it possible to use special symbols in metadata? I tried to set producer = Fingerspitzengefühl and got:

image

I used get info option of the file, but pdfinfo (command line tool) shows correct info.

boazsegev commented 3 years ago

Hi @d-kononov ,

Thank you for your question. This might be related to #192 , which might answer your question.

As stated over there:

The PDF specifications don't support UTF-8. Multi-lingual documents are handled by using Fong mappings where ANSI letters are mapped to the international glyph.

However, the title information has no font and no mapping, so I think reader software reads the title string as ANSI letters (maybe adding a UTF-8 BOM will fix that, but I'm not remotely sure) ...

... anyway, this was the case before PDF 2.0. I have no idea how the new standard looks like because it isn't available for free.

I have no idea how to change that.

Kindly, Boaz Segev.

conorom commented 3 years ago

I realize this issue is closed but I ran into the same thing and solved it by encoding the string as UTF-16: pdf.to_pdf({ producer: producer.encode('utf-16') })

The value then shows up correctly in exiftool and Acrobat file properties. https://tex.stackexchange.com/a/245969 https://stackoverflow.com/a/3063966