Closed dlfischer-cmm closed 1 year ago
Thanks for opening this issue!
Your first solution using ->(n,f) { f.wrapped_font.missing_glyph_id }
won't work correctly since the result of the lambda needs to be a Glyph object, not a Symbol (in case of Helvetica) or Integer (in case of OpenSans). This was changed 6 years ago.
The second solution using ->(n,f) { f.glyph(0) }
will work for TrueType fonts since they use integer glyph IDs but not for Type1 fonts which use glyph names (symbols). You would need to base the result on the f
argument.
Generally, the built-in fonts like Helvetica don't have a special glyph representing '.notdef', so you have to choose one of the available glyphs for representing a missing character. TrueType fonts, on the other hand, must have a glyph with ID=0 representing a missing glyph.
And yes, there is a difference in line break handling between Canvas#text
and the TextLayouter
class. The former uses the provided string and splits it on valid Unicode newline separators. Then those lines are converted to arrays of Glyph
objects and those are directly rendered.
The TextLayouter
transforms the whole given text string into an array of Glyph
objects. During this process the characters of \t
, \n
, and so on are mapped by default (via font.on_missing_glyph
) to InvalidGlyph
instances. When layouting the text (i.e. the array of glyph objects) those InvalidGlyph
objects are transformed into usable objects if they represent certain special characters like \t
or \n
.
Since this information isn't available anymore when font.on_missing_glyph
is changed to always return the same glyph, line breaks, tabs, and so on cannot be identified anymore and therefore won't work.
Use the following for font.on_missing_glyph
:
require 'hexapdf/font/type1_wrapper'
require 'hexapdf/font/true_type_wrapper'
HexaPDF::Font::Type1Wrapper.public_constant(:Glyph)
HexaPDF::Font::TrueTypeWrapper.public_constant(:Glyph)
doc.config['font.on_missing_glyph'] = lambda do |c, f|
if f.font_type == :Type1
HexaPDF::Font::Type1Wrapper::Glyph.new(f.wrapped_font, :question, c)
else
HexaPDF::Font::TrueTypeWrapper::Glyph.new(f.wrapped_font, 0, c)
end
end
This way glyph objects are returned referencing a known, existing glyph in the font (the question mark for Helvetica and the missing glyph for OpenSans) but with different string representations (allowing the TextLayouter
to do its work correctly with respect to newlines, tabs, etc.).
I will think about how to make this easier since this is probably something many people would want to do.
@dlfischer-cmm The next version of HexaPDF comes with a helper method that allows you to achieve the solution in an easier way. This is now also documented in the font.on_missing_glyph
configuration option:
doc.config['font.on_missing_glyph'] = lambda do |character, font_wrapper|
font_wrapper.custom_glyph(font_wrapper.font_type == :Type1 ? :question : 0, character)
end
Excellent! Thank you for the detailed explanation. My team has implemented the solution you provided and it's working perfectly in our tests. We appreciate your help! Have a great day. :)
Hello, my team has encountered a challenge using HexaPDF that we've been unable to solve. We're creating a PDF that displays user input. In some cases, users have inserted tab characters (\t) as content and this triggers a HexaPDF error "Glyph for "\t" missing". This needs to be handled gracefully in a production environment; an invalid glyph should not be a reason for PDF creation to fail. We have tried to use the
doc.config['font.on_missing_glyph']
option without success and I think in the process revealed a bug in how text formatting (or at least line breaks) is handled when this config option is used.Using the following example:
Results: When not using either Glyph solution, this code throws an error
Glyph for "\t" missing
regardless of the font used.Comparing the results of using Glyph solutions 1 & 2 along with each font (Helvetica and Open Sans) yields unexpected results.
Type1 encoding has no codepoint for .notdef
canvas.text
is OK but withframe.draw
the formatting (including tabs and line returns) is lost. See sample screenshot 1 below.The expected result is what you see in sample screenshot 2, but with the correct formatting for frame.draw and working for both fonts. In other words, replace missing glyphs with a default glyph and preserve the rest of the text formatting / line breaks.
Any solution you can offer is much appreciated!
Sample 1:
Sample 2: