gettalong / hexapdf

Versatile PDF creation and manipulation for Ruby
https://hexapdf.gettalong.org
Other
1.21k stars 69 forks source link

font.on_missing_glyph and emojis #268

Closed robotbill closed 11 months ago

robotbill commented 11 months ago

Hello,

Based on your comment here I'm trying to use font.on_missing_glyph to use a different font to render emojis. If I set the canvas font to NotoEmoji-Regular then add the text the emojis render correctly. However, when I have the font set to another font (Arimo-Regular) and change out the font-wrapper in font.on_missing_glyph the characters are not being rendered correctly.

What am I missing?

Sample Code

require 'hexapdf'

document = HexaPDF::Document.new

arimo = document.fonts.add(File.open('Arimo-Regular.ttf'))
emoji_font = document.fonts.add(File.open('NotoEmoji-Regular.ttf'))

document.config['font.on_missing_glyph'] =
  proc do |character, font_wrapper|
    if emoji_font.wrapped_font == font_wrapper.wrapped_font
      HexaPDF::Font::InvalidGlyph.new(font_wrapper.wrapped_font, character)
    else
      emoji_font.decode_utf8(character).first
    end
  end

canvas = document.pages.add.canvas
canvas.font(arimo, size: 10)

canvas.text('Regular font', at: [20, 650])
canvas.text("\u{1F96F} \u{1F354}", at: [20, 600])

canvas.text('Emoji font:', at: [20, 550])

canvas.font(emoji_font, size: 10)
canvas.text("\u{1F96F} \u{1F354}", at: [20, 500])

document.write('emoji.pdf', optimize: true)

Output

output

Both of the fonts are available from google: https://fonts.google.com/noto/specimen/Noto+Emoji https://fonts.google.com/specimen/Arimo

Thanks, Joe

gettalong commented 11 months ago

The font.on_missing_glyph configuration needs to return a glyph object from the same font. This is the reason why your code won't work. I will clarify the docs for font.on_missing_glyph in this regard.

When you call canvas.font you are setting the font to be used for canvas.text calls. The code therefore assumes that all glyph objects provided to canvas.show_glyphs (which is called from canvas.text) are for that font. This means that the emoji glyphs are interpreted in terms of the regular font.

What you want to do is currently not built into HexaPDF, so you would need to segment the text yourself, something like:

parts = string.split(/(\u{1F96F}|\u{1F354})/).map.with_index do |str, index|
  index % 2 == 0 ? [arimo, str] : [emoji_font, str]
end
canvas.move_text_cursor(offset: [30, 250])
parts.each do |font, str|
  canvas.font(font)
  canvas.text(str) # automatically positions the parts after another
end
robotbill commented 11 months ago

The font.on_missing_glyph configuration needs to return a glyph object from the same font.

Thank you, I was suspicious that it was something like that.

mockdeep commented 9 months ago

@gettalong is there no way to add fallback fonts? Aside from emojis, we have users who input information in various languages, such as Korean (조 은). The browser falls back to other system fonts (Noto on my system). It's pretty rough to have to manage that character by character.

gettalong commented 9 months ago

@mockdeep The way the text output currently works is rather optimized since this part can get really slow if not careful.

I'm not sure it would make much sense in the context of the Canvas class since there you only have low-level methods. I.e. you explicitly need to set the font using the #font method before using any of the text showing methods.

However, it's different for the document layout functionality. There you can reasonably expect something like this.

So: Are you using the document layout functionality or the low-level Canvas class for your use-case?

mockdeep commented 9 months ago

@gettalong it looks like we're using a mixture of both. We use HexaPDF::Layout::TextLayouter to fit text and then draw it onto a canvas with coordinates. In short, we are placing text in arbitrary locations on existing PDFs. If there's a better way to handle it, we'd be happy to change it, especially if we can get better font support.

gettalong commented 9 months ago

@mockdeep Since you are using the document layout facilities to fit the text, it is enough to adapt that part since changing out the fonts needs to be done before/during fitting of the text.

I have pushed the fallback font feature to the top of my todo list.

mockdeep commented 9 months ago

Excellent, thanks so much!

gettalong commented 8 months ago

@robotbill @mockdeep Just a heads up: I have implemented support for fallback fonts, will be in the next release.

require 'hexapdf'

HexaPDF::Composer.create('composer_fallback_fonts.pdf') do |composer|
  zapf_dingbats = composer.document.fonts.add('ZapfDingbats')
  composer.document.config['font.on_invalid_glyph'] = lambda do |codepoint, invalid_glyph|
    [zapf_dingbats.decode_codepoint(codepoint)]
  end
  composer.text('This text contains the scissors symbol ✂ which is not available in ' \
                'the default font Times but available in the set ZapfDingbats fallback ' \
                'font. Other symbols from ZapfDingbats like ✐ and ✈ can also be used.' \
                "\n\n❤ HexaPDF")
end

Output:

image

The example above only shows the usage of a single fallback font but it can be generalized to use any number of fallback fonts (by checking whether the glyph returned by #decode_codepoint is a valid one using the (also new) #valid? method).

mockdeep commented 8 months ago

@gettalong thanks so much! This will be really great for our users. Does it support multiple fallbacks? I'm imagining using Noto as our primary fallback, but I vaguely recall there being a handful of common emojis it doesn't support. Regardless, I think this is a huge win.

gettalong commented 8 months ago

@mockdeep Great it helps!

Yes, you can use any number of fallback fonts and even decide on a glyph per glyph basis what to use (which is useful to decide which style of a font, i.e. regular, bold, italic, bold+italic, to use):

noto = document.fonts.add(path_to_noto_regular)
noto_bold = document.fonts.add(path_to_noto_bold)
document.config['font.on_invalid_glyph'] = lambda do |codepoint, invalid_glyph|
  if invalid_glyph.font_wrapper.wrapped_font.weight == 400 # normal weight
    [noto.decode_codepoint(codepoint)]
  else
    [noto_bold.decode_codepoint(codepoint)]
  end
end

Hmm... I should probably add a default implementation that will do this by default :thinking:

gettalong commented 8 months ago

@mockdeep I have simplified the use for the most common case of using fallback fonts:

require 'hexapdf'

HexaPDF::Composer.create('fallback.pdf') do |composer|
  composer.document.config['font.map'] = {
    'Emoji' => {none: '/home/thomas/.fonts/NotoEmoji-Regular.ttf',
                bold: '/home/thomas/.fonts/NotoEmoji-Bold.ttf'}
  }
  composer.document.config['font.fallback'] = ['Emoji']
  composer.text("asdfds \n asdfsd 🙈 Tdsfds", font_size: 20)
  composer.text("asdfds \n asdfsd 🙈 Tdsfds", font_size: 20, font: ['Times', variant: :bold])
end

results in

image

This is done by providing a default implementation for font.on_invalid_glyph that uses the new font.fallback configuration option.

So as a user you would just have to register the fonts you would like to use as fallback and then set them as fallback fonts. Everything else should just work.

mockdeep commented 8 months ago

@gettalong excellent! Thank you! And if one fallback doesn't have a particular symbol, I assume we can pass multiple? Something like:

composer.document.config['font.fallback'] = ['Emoji', 'LastResort']

It probably doesn't matter much, since probably 99.9% of missing glyphs will be handled by Noto.

gettalong commented 8 months ago

@mockdeep Yes, exactly.

robotbill commented 7 months ago

Thanks, this looks rad!