gettalong / hexapdf

Versatile PDF creation and manipulation for Ruby
https://hexapdf.gettalong.org
Other
1.22k stars 69 forks source link

Expected return value for `font.on_missing_glyph`? #177

Closed eric-hemasystems closed 2 years ago

eric-hemasystems commented 2 years ago

I am occasionally getting a "Glyph for is missing" error. Your documentation where it says:

The standard 14 PDF fonts are those fonts that all PDF reading/viewing applications need to support. They only provide a limited set of glyphs

helped me determine if I want a fuller range of character support I need to embed my own TrueType font. This seems to have resolve a lot of the issue but user data still occasionally has characters that don't have a glyph in my font such as a tab character.

Right now I am handling this by just squashing the error and none of the content is placed on the PDF. Ideally I would like to keep any user content that does have a glyph. Either replace the invalid chars with an empty string OR possibly a ? to indicate a unsupported character was supposed to go here.

The font.on_missing_glyph seems perfect for that. Instead of returning a HexaPDF::Font::InvalidGlyph object (which triggers the exception) I have have it return my glyph of choice. The problem is I don't know what object that is. Based on the code it seems like it should be a HexaPDF::Font::TrueTypeWrapper::Glyph object. But that is a private class.

Even if I use const_get to bypass the protection I'm not sure what I pass for the constructor arguments. I tried some things like :question (based on this glyphlist), or ? or even 63 (ascii code for question mark) all to no avail.

The HexaPDF::Font::TrueTypeWrapper#glyph method looks promising but the documentation indicates it should not be used by an application even though public. Plus to create an instance of HexaPDF::Font::TrueTypeWrapper I need a HexaPDF::Document which I don't have when configuring the font.on_missing_glyph on the DefaultDocumentConfiguration.

I must be missing something as I cannot seem to figure out what font.on_missing_glyph should return if we don't want the default behavior of returning a HexaPDF::Font::InvalidGlyph object.

gettalong commented 2 years ago

Yes, font.on_missing_glyph is exactly for such a situation.

As for the issue with modifying the DefaultDocumentConfiguration, you are right, the document instance would be needed there to facilitate the setting of the behaviour. Currently the third argument passed to the block is the wrapped font, ie. nothing directly associated with a document. However, as per your explanation it would make more sense to actually use the wrapper font. Then you can easily use the #decode_utf8 if needed or access the document via wrapper_font.pdf_object.document. I will change the behaviour for the next release.

Let me know if that helps!

eric-hemasystems commented 2 years ago

I attempted this. Here is how I'm creating and configuring my document:

HexaPDF::Document.new(io: background_io).tap do |doc|
  font = doc.fonts.add 'Source Sans Pro' # Embed font into document

  # Replace any unknown glyphs (such as tab) with a question mark
  unknown_glyph = font.decode_utf8('?').first
  doc.config['font.on_missing_glyph'] = proc { unknown_glyph }
end

background_io is a StringIO object of an existing PDF (download from ActiveStorage). With this in place I still get the missing glyph error. Furthermore I replaced the proc with just a debugger statement:

doc.config['font.on_missing_glyph'] = proc { debugger }

When I ran my test it never stopped in the callback. If I do the same thing globally:

HexaPDF::DefaultDocumentConfiguration['font.on_missing_glyph'] = proc { debugger }

it does stop at my breakpoint but of course now I don't have the document reference. I also tried this which does stop at my breakpoint:

HexaPDF::Document.new(io: background_io, config: { 'font.on_missing_glyph' => proc { debugger } })

But of course this doesn't work because this doesn't give me a chance to create the glyph from the font that is tied to the document.

I dug into this a bit and I think it due to some refactoring. With that commit the proc was no longer pulled from the config at runtime when the callback was executed but when the font is initiaized.

My work-around is to create the closure before the font is embedded but then after embedding that font update the variable captured by the closure with the desired character.

@document ||= HexaPDF::Document.new(io: background_io).tap do |doc|
  unknown_glyph = nil
  doc.config['font.on_missing_glyph'] = proc { unknown_glyph }

  font = doc.fonts.add 'Source Sans Pro' # Embed font into document

  unknown_glyph = font.decode_utf8('?').first
end

This works for now but if you get in that code to make the global config work perhaps we can ensure if the config is set after the font is defined that it will still use the config.

gettalong commented 2 years ago

I dug into this a bit and I think it due to some refactoring. With that commit the proc was no longer pulled from the config at runtime when the callback was executed but when the font is initiaized.

You are right, this is indeed the problem. I will fix this in a bug fix release in the next few days. Thanks for the debugging! :pray:

gettalong commented 2 years ago

@eric-hemasystems I have changed the block signature for the font.on_missing_glyph configuration option and fixed the caching bug. Will release later today or tomorrow.

gettalong commented 2 years ago

@eric-hemasystems New version 0.21.0 is out with the fixes.