Add check that common ligatures are set up corrected

davelab6 commented 6 years ago

Observed behaviour

Some fonts have common ligatures that are misplaced, in liga instead of dlig, and we can have a simple check for these common cases. (Eg https://github.com/google/fonts/issues/1662)

Expected behaviour

Based on the current collection, we can likely do a statistical analysis of what the most common ligatures are in the latin-ext collection and give a recommendation for which OT feature tag they should be in (eg liga vs dlig vs clig etc)

felipesanches commented 5 years ago

This seems to be a useful tool for that purpose: https://github.com/googlei18n/fontreport

felipesanches commented 5 years ago

I wrote a small python script to list all discretionary ligatures in a given TTF.

I will try to adapt it to iterate on all fonts in the collection and perform the statistical analysis that @davelab6 suggested.

felipesanches commented 5 years ago

This is the current state of the script, by the way...

import sys
from fontTools.ttLib import TTFont

if len(sys.argv) != 2:
  sys.exit(f"usage: {sys.argv[0]} fontfile.ttf")

fontname = sys.argv[1]
ttFont = TTFont(fontname)
if "GSUB" not in ttFont:
  sys.exit("Font lacks a GSUB table!")

indices = set()
for f in ttFont["GSUB"].table.FeatureList.FeatureRecord:
  if f.FeatureTag == 'dlig':
    for index in f.Feature.LookupListIndex:
      indices.add(index)

dlig = [ttFont["GSUB"].table.LookupList.Lookup[index]
        for index in indices]

def ligatures(dlig):
  ligs = []
  for x in dlig:
    for subtable in x.SubTable:
      for glyph, ligatures in subtable.ligatures.items():
        for ligature in ligatures:
          lig = f"{glyph} {' '.join(ligature.Component)} -> {ligature.LigGlyph}"
          ligs.extend([lig])
  return ligs

print ('\n'.join(ligatures(dlig)))

felipesanches commented 5 years ago

Among the families of the Google Fonts collection that are tagged with latin-ext subset on METADATA.pb, these are the discretionary ligatures that are declared in 3 or more families:

'c t -> c_t': 28 's t -> s_t': 27 'f j -> f_j': 17 'f t -> f_t': 15 'f i -> f_i': 15 'f l -> f_l': 15 't t -> t_t': 11 'T h -> T_h': 10 's p -> s_p': 9 'c h -> c_h': 8 'c k -> c_k': 8 'f f j -> f_f_j': 6 'longs t -> longs_t': 6 'uni041D uni0413 -> uni04A4': 6 'uni043D uni0433 -> uni04A5': 6 'g j -> g_j': 5 'f u -> f_u': 5 'f f t -> f_f_t': 5 'f f ij -> f_f_ij': 4 'f ij -> f_ij': 4 'uniFEB4 uniFEAE -> uniFD2A': 4 'uniFEB4 uniFEF0 -> uniFD17': 4 'uniFEB3 uniFEAE -> uniFD0E': 4 'uniFEB3 uniFEF0 -> uniFCFB': 4 'uniFEBC uniFEAE -> uniFD2B': 4 'uniFEBB uniFEAE -> uniFD0F': 4 'Iacute J acutecomb -> Iacute_J_acutecomb': 4 'Iacute J.loclNLD acutecomb -> Iacute_J_acutecomb': 4 'I.loclNLD J.loclNLD -> I_J.loclNLD.dlig': 4 'g u -> g_u': 4 'g uacute -> g_uacute': 4 'g ubreve -> g_ubreve': 4 'g uni01D4 -> uni006701D4': 4 'g ucircumflex -> g_ucircumflex': 4 'g udieresis -> g_udieresis': 4 'g uni01D8 -> uni006701D8': 4 'g uni01DA -> uni006701DA': 4 'g uni01DC -> uni006701DC': 4 'g uni01D6 -> uni006701D6': 4 'g ugrave -> g_ugrave': 4 'g uhungarumlaut -> g_uhungarumlaut': 4 'g umacron -> g_umacron': 4 'g uogonek -> g_uogonek': 4 'g uring -> g_uring': 4 'g utilde -> g_utilde': 4 'gbreve u -> gbreve_u': 4 'gbreve udieresis -> gbreve_udieresis': 4 'iacute j acutecomb -> iacute_j_acutecomb': 4 'iacute j.loclNLD acutecomb -> iacute_j_acutecomb': 4 'r t -> r_t': 4 'i.loclNLD j.loclNLD -> i_j.loclNLD.dlig': 4 'Q y -> Q_y': 4 'f f i -> f_f_i': 4 'f f l -> f_f_l': 4 'f f -> f_f': 4 'e e -> e_e': 3 'l l -> l_l': 3 'f i -> uniFB01': 3 'f l -> uniFB02': 3 'uniFEBC uniFEF0 -> uniFD21': 3 'uniFEBB uniFEF0 -> uniFD05': 3 'Q uni0233 -> uni00510233': 3 'I.loclNLD.c2sc J.loclNLD.c2sc -> i_j.loclNLD.sc.dlig': 3 'f f u -> f_f_u': 3 'f f umacron -> f_f_umacron': 3 'f umacron -> f_umacron': 3 'iacute.sc j.sc acutecomb -> iacute_j_acutecomb.sc': 3 'iacute.sc j.loclNLD.sc acutecomb -> iacute_j_acutecomb.sc': 3 'i.sc.loclNLD.sc j.loclNLD.sc -> i_j.loclNLD.sc.dlig': 3 'f f i -> uniFB03': 3 'f f l -> uniFB04': 3 'f b -> f_b': 3 'f f -> uniFB00': 3 'f h -> f_h': 3 'f k -> f_k': 3 'longs l -> longs_l': 3 't y -> t_y': 3 'f i -> fi': 3 'f l -> fl': 3 'c p -> c_p': 3

felipesanches commented 5 years ago

note: there are currently 603 families tagged subset: latin-ext

felipesanches commented 5 years ago

OK. I've been tweaking my script and it is still sort of a kludge, but some interesting facts are starting to surface.

For instance, now I am monitoring both dlig and liga and there are some cases where the same ligature is setup differently among the families in the GFonts collection such as here:

screenshot at 2019-01-25 03 26 20

I will do the same for clig and then I'll try to come up with a fine report of the problematic ligatures in our collection.

felipesanches commented 5 years ago

alright... my current understanding is that it is OK to have both a liga and a dlig for the same sequence of ligature components. And also that the original problem that inspired @davelab6 to open this issue was that indeed a few fonts in the collection have ligatures placed only in liga, but not in dlig.

The most common casos for this seem to be s c -> s_c and c t -> c_t. Does that sound right?

felipesanches commented 5 years ago

For instance, Almendra-Regular.ttf has got an s_t which is declared as a standard ligature using the liga feature, which wewould expect it to be declared as a discretionary ligature.

Or at lease to have both dlig and liga for s + t, right?

felipesanches commented 5 years ago

screenshot at 2019-01-25 05 08 55

screenshot at 2019-01-25 05 07 26

screenshot at 2019-01-25 05 09 30

m4rc1e commented 5 years ago

For instance, Almendra-Regular.ttf has got an s_t which is declared as a standard ligature using the liga feature, which wewould expect it to be declared as a discretionary ligature.

Or at lease to have both dlig and liga for s + t, right?

I would expect s_t in this instance to be in liga only.

This check is a tricky one. I've always assumed that "liga" ligatures are there to improve the legibility of the text such as 'ffi' etc, whilst "dlig" ligatures are there for stylistic reasons.

Bear in mind that script fonts may need "liga" ligatures just to make the text legible. Such ligatures may never be needed in text typefaces.

Personally I can't fathom how anyone can write a script which could work this out for all edgecases... without some machine vision algo in place. Perhaps this check should only be run on Sans and San Serif typefaces?

davelab6 commented 5 years ago

I think Adobe apps use "st" as the icon graphic for dlig?

I would definitely automatic liga substitutions as some things designer users rarely want to turn off, and end user readers don't really notice.

"st" is archaic and very commonly a dlig discretionary ligature because readers notice it as really weird. While Almendra is a medieval style, I think it's overstepping to have "st" in liga. Should only be dlig.

laerm0 commented 5 years ago

Hey @felipesanches –

So I can handle https://github.com/google/fonts/issues/1662, would it be possible to edit your script so that it outputs font names with their ligature features? Something like tab-/comma-separated text with a header field like fontname,liga,dlig,clig and then an x in the ligature columns so I can paste it into a spreadsheet?

felipesanches commented 5 years ago

yes, @laerm0. I will try to make these changes to the script and I will get back to you shortly with a new version that you can use for that purpose.

fonttools / fontbakery

Add check that common ligatures are set up corrected #2024

Observed behaviour

Expected behaviour