gettalong / hexapdf

Versatile PDF creation and manipulation for Ruby
https://hexapdf.gettalong.org
Other
1.24k stars 70 forks source link

OpenType fonts not supported (subsetting)? #191

Closed jslabovitz closed 9 months ago

jslabovitz commented 2 years ago

I'm writing a typography-oriented application that depends on OpenType fonts. I'm actually using Harfbuzz to shape/position the text into glyphs, then drawing the glyphs using #show_glyphs. My initial experiments are a success — as long as I turn off subsetting. If I don't turn off subsetting, HexaPDF crashes while accessing the glyf table, which as I understand may not exist for all (any?) OpenType fonts. My minimal code & resulting crash:

doc = HexaPDF::Document.new
doc.fonts.add('/Users/johnl/Fonts/A/Adobe Caslon Pro/ACaslonPro-Regular.otf')
doc.write('/tmp/out.pdf')
/opt/homebrew/lib/ruby/gems/3.1.0/gems/hexapdf-0.24.2/lib/hexapdf/font/true_type/subsetter.rb:181:in `block in add_glyph_components': undefined method `[]' for nil:NilClass (NoMethodError)

            glyf[gid].components&.each {|cgid| use_glyph(cgid) }
                ^^^^^
    from /opt/homebrew/lib/ruby/gems/3.1.0/gems/hexapdf-0.24.2/lib/hexapdf/font/true_type/subsetter.rb:179:in `each'

Is it indeed true that I won't be able to use OpenType fonts in HexaPDF, and still be able to subset them? If so, do you have plans to add this?

gettalong commented 2 years ago

So first: Very cool for using Harfbuzz for shaping/positioning! :+1: The current shaping mechanism in HexaPDF only supports basic ligatures but using a real shaping engine like Harfbuzz has been on my TODO list for a long time. Are you using a Ruby interface for this or something else?

HexaPDF only supports TrueType fonts at the moment, so those OpenType fonts that contain a glyf table. Since I found that many fonts are available in both formats, ie. a version with glyf table and another one with CFF, there was no urgency in implementing the later tables. CFF tables are also a bit more complicated to parse and integrate. It might be possible to make use of ttfunk of the Prawn object which recently gained supported for CFF type fonts but I haven't looked into this.

Note that .otf may contain either CFF tables or glyf tables but usually contain CFF tables.

So: You can't use CFF type OpenType fonts with HexaPDF at the moment. There are plans to implement support but this is a long term goal.

I'm quite surprised that using a CFF type font without subsetting works since the PDF objects created for the font tell the PDF viewer that it is a TrueType font. Therefore this may be coincidence and not work across all viewers.

Are you sure that those experiments use a OTF font with a CFF table and not a glyf table?

jslabovitz commented 2 years ago

Hmm. I'm using macOS's Preview app to view the generated PDFs. Perhaps there is some magic fallback logic in macOS that allows these PDFs to work, even if they aren't entirely valid?

Unfortunately about 95% of my OpenType fonts are CFF format, not TT (glyf) format. And I've been maintaining this collection for ~20 years, so I'm not likely at this point to replace my ancient CFF fonts with TT versions. :-)

To experiment, I found an OTF font with a glyf table (https://www.fontsquirrel.com/fonts/Aller), and happily HexaPDF handles it just fine, with no crash on subsetting.

I might be up for helping to parse/subset the CFF data for embedding & subsetting. In the past, I wrote a large part of an OpenType parser/shaper (in Ruby), though abandoned it once I found Harfbuzz.

Speaking of Harfbuzz (and apologies for leading this issue into digressions): Several years ago, I wrote a basic gem to interact with the Harfbuzz library, using FFI. It's on Rubygems/github, and in fact I just updated it to fix a few things: http://github.com/jslabovitz/harfbuzz-gem

I spent some time in the last few days writing a small script that first writes basic text using the HexaPDF API, then using Harfbuzz. Unfortunately it's not exactly compatible with your current TextShaper class, as Harfbuzz needs pre-decoded UTF8 strings to do its shaping. But it should give you the basic idea of how shaping using Harfbuzz might work.

require 'harfbuzz'
require 'hexapdf'

def harfbuzz_shape(text, fragment, font_file:)
  font = fragment.style.font
  font_size = fragment.style.font_size
  #FIXME: use +/- for true/false
  features = fragment.style.font_features.select { |k, v| v }.keys.map(&:to_s)
  #FIXME: try to get font_file from the wrapped_font object, and cache the face for efficiency
  hb_face = Harfbuzz::Face.new(File.open(font_file, 'rb'))
  hb_font = Harfbuzz::Font.new(hb_face, font_size)
  buffer = Harfbuzz::Buffer.new
  buffer.add_utf8(text)
  buffer.guess_segment_properties
  Harfbuzz.shape(hb_font, buffer, features)
  buffer.normalize_glyphs
  glyph_infos = buffer.get_glyph_infos
  glyph_positions = buffer.get_glyph_positions
  fragment.items = []
  glyph_infos.each_with_index do |info, i|
    position = glyph_positions[i]
    advance = hb_font.glyph_advance_for_direction(info.codepoint, Harfbuzz::HB_DIRECTION_RTL)
    kern = advance - position.x_advance
    fragment.items << font.glyph(info.codepoint)
    fragment.items << kern unless kern == 0
  end
end

# https://www.fontsquirrel.com/fonts/Aller
font_file = '/Users/johnl/Fonts/A/Aller/Aller_Rg.ttf'
size = 100
text = 'WAVE first!'

doc = HexaPDF::Document.new
wrapped_font = doc.fonts.add(font_file)
style = HexaPDF::Layout::Style.new(
  font: wrapped_font,
  font_size: size,
  font_features: { kern: true, liga: true })
canvas = doc.pages.add([0, 0, 1000, 1000]).canvas

# standard text
fragment = HexaPDF::Layout::TextFragment.create(text, style)
# pp fragment.items
fragment.draw(canvas, 0, size * 2)

# Harfbuzz text
fragment = HexaPDF::Layout::TextFragment.create(text, style)
harfbuzz_shape(text, fragment, font_file: font_file)
# pp fragment.items
fragment.draw(canvas, 0, size * 1)

doc.write('/tmp/out.pdf')
gettalong commented 2 years ago

Regarding OpenType CFF support: The https://github.com/prawnpdf/ttfunk/ gem which is used by Prawn recently got support for OpenType fonts. So adding CFF support to HexaPDF via ttfunk might not be that hard but I haven't looked into it. All the code concerning CFF seems to be in https://github.com/prawnpdf/ttfunk/tree/master/lib/ttfunk/table/cff. The integration into Prawn is at https://github.com/prawnpdf/prawn/blob/master/lib/prawn/fonts/ttf.rb.

Note, however, that Prawn does font embedding a bit differently because they don't use composite fonts but simple PDF fonts. Hmm... and I just saw this: https://github.com/prawnpdf/prawn/blob/master/lib/prawn/fonts/ttf.rb#L367 where they hardcode the type to TrueType... So I'm not sure that Prawn itself already supports using OTF with CFF tables...

Thanks for info about Harfbuzz and the code example! Will put that on my TODO list to have a deeper look later.

gettalong commented 9 months ago

@jslabovitz I'm closing this issue since the original question about OpenType font support has been answered.

Adding OpenType font support as well as providing integration with harfbuzz for better (and more correct) glyph positioning is on my ToDo list - thanks again for the pointer!