Open PhilterPaper opened 3 years ago
Any ligatures would likely have to be backed out, at least for word-splitting purposes, and then put back in if the word wasn't split through a ligature. If it was, the fragment on either side might contain a shorter ligature, requiring HarfBuzz::Shaper to be called again, against both fragments. On the bright side, it's unlikely that decomposing a ligature into letters, or vice-versa, will change the length of the word sufficiently to require another pass with Text::KnuthPlass. It should be a small enough change that the ratio (affecting glue length) could just be updated.
he stiffly brushed aside original text
he sti|ffl|y brushed aside H::S decides it wants to use the 'ffl' ligature
he stiff-ly brushed aside T::KP decides it wants to split the line between 'stiff' and 'ly'
update glue sizing (ratio)
he sti|ff|-ly brushed aside H::S now uses the 'ff' ligature
readjust glue sizing
What if HarfBuzz::Shaper was called after Text::KnuthPlass? This might be feasible if ligatures are the only thing in play (no direction or alphabet changes, no font size changes, etc.). Presumably the substitution of ligatures (after the lines are already split) would just entail a small update to line ratios, to get back the desired alignment. This might not be the case for complex scripts such as Arabic or Indic languages, where glyph substitutions for various kinds of ligatures could entail substantial length changes.
Note that #2 is concerned more with word splitting on Latin text for non-English text, but still applies quite a bit to this ticket's area of interest, so be sure to look at both tickets when doing something regarding word and line splitting. Keep in mind that the only reason to worry about splitting a word is that a line needs to be split, and the best fit may be through a word (hyphenation, etc.).
The PDF::Builder package can typeset using HarfBuzz::Shaper to substitute a font's ligatures for sequences of lowercase letters. It does not currently natively call Text::KnuthPlass, but I plan to add this in the near future. Some potential problems arise when Harfbuzz::Shaper is used, and decides it wants to substitute some ligatures. This will mean that Text::KnuthPlass will have to accept not just plain text, but also the HarfBuzz arrays of processed glyphs, which could include ligatures. How this will interact with word-splitting (patterns and exceptions assuming no ligatures) remains to be seen. We also need to think about word-splitting with connected cursive scripts such as Arabic, and highly processed complex scripts such as Devanagari or Khmer, not to mention bi-directional (RTL) scripts, and mixtures of different types.