gettalong / hexapdf

Versatile PDF creation and manipulation for Ruby
https://hexapdf.gettalong.org
Other
1.21k stars 69 forks source link

hexapdf inspect gives Problem encountered: No Unicode mapping #315

Closed qooxzuub closed 1 month ago

qooxzuub commented 1 month ago

For the attached file I get a No Unicode mapping error with hexapdf 0.45.0:

$ hexapdf inspect /tmp/test3.pdf psd 1
begin_text
  set_text_matrix 11.208 0 0 11.208 93.72 590.6851
  set_font_and_size /T1_5 1
  move_text 1.076 1.4026
Problem encountered: No Unicode mapping for code point 4 in font OIAEGG+MTEX

The file renders OK in several viewers.

test3.pdf

qooxzuub commented 1 month ago

After a quick look at the code, I guess this might be the intended behaviour. Unfortunately it makes psd in hexapdf inspect a lot less useful than it could be, as the hexapdf binary exits immediately on encountering any character it can't convert to text and you don't get to see any content following that character.

Could the error be handled by skipping over the troublesome character with a warning and then continuing, rather than exiting?

gettalong commented 1 month ago

Thanks for this idea! I will change the output so that \uFFFD (which is the Unicode replacement character) is output instead, and a warning to the standard error output:

$ hexapdf ins /tmp/test3.pdf psd 1
begin_text
  set_text_matrix 11.208 0 0 11.208 93.72 590.6851
  set_font_and_size /T1_5 1
  move_text 1.076 1.4026
No Unicode mapping for code point 4 in font OIAEGG+MTEX, using the Unicode replacement character
  text> �
end_text