adobe-type-tools / afdko

Adobe Font Development Kit for OpenType
https://adobe-type-tools.github.io/afdko/
Other
1.01k stars 166 forks source link

Reordering of ligature substitution rules is considered harmful #1727

Open khaledhosny opened 7 months ago

khaledhosny commented 7 months ago

The Feature File Specification §5.d, states that:

A contiguous set of ligature rules does not need to be ordered in any particular way by the font editor; the implementation software must do the appropriate sorting. So:

sub f f     by f_f;
sub f i     by f_i;
sub f f i   by f_f_i;
sub o f f i by o_f_f_i;

will produce an identical representation in the font as:

sub o f f i by o_f_f_i;
sub f f i   by f_f_i;
sub f f     by f_f;
sub f i     by f_i;

There are several issues with this:

  1. It is very surprising to users, since the code has one order and the binary silently gets a different order, and the order matters as it controls which substitution is applied first,
  2. There is no way to prevent this automatic re-ordering, other than splitting each substitution to its own lookup which is wasteful and unnecessary,
  3. The sorting algorithm is undocumented, so there is no clear way to verify that implementations are implementing it compatibly.

I think this sorting should be deprecated and dropped, or if back-compatibility is a concern, have a way to disable it.

frankrolf commented 7 months ago

I remember a time where the mantra “longer ligatures first” was important. I only found out about the re-ordering when trying to demonstrate this problem in one of my workshops.

I can see how this behavior might be considered a theoretical problem, but I think the benefits outweigh this concern. It seems natural for users to write shorter substitutions first.

That said, do you have a practical example where this re-sorting would cause actual harm?

FWIW, the sorting algorithm seems to be here: https://github.com/adobe-type-tools/afdko/blob/develop/c/makeotf/lib/hotconv/GSUB.c#L1730-L1768

khaledhosny commented 7 months ago

See https://forum.glyphsapp.com/t/prioritizing-certain-ligatures/19433/14 for an example.

skef commented 7 months ago

I don't see us just removing this part of the spec. Documenting the ordering requirement could be valuable, although there are a lot of things like this in the older parts of the spec and that horse may have left the barn. (We can document what AFDKO does, but that doesn't mean other implementations will update their algorithms if those differ.

We could add a flag to disable the sorting, but that would operate on a font-wide basis.

Seems like it might be better to add some sort of "explicit" command, similar to "subtable", that blocks any reordering within a lookup at the point where it is used.

khaledhosny commented 7 months ago

FWIW, the sorting algorithm seems to be here: https://github.com/adobe-type-tools/afdko/blob/develop/c/makeotf/lib/hotconv/GSUB.c#L1730-L1768

This sorts by length and GID, which is double bad. Sorting by legnth is understandable, though misguided, but sorting by GID makes no sense.

  1. The sorting algorithm is undocumented, so there is no clear way to verify that implementations are implementing it compatibly.

Case in point, FontTools only sorts by length https://github.com/fonttools/fonttools/blob/fa59ada1b557bc304c592a2ca91c6b99ff6d241d/Lib/fontTools/otlLib/builder.py#L1570

Lorp commented 7 months ago

Is the sort by glyphId simply to ensure consistent results between different sort algos?

khaledhosny commented 7 months ago

I don’t think there is any point in sorting by GID, as it changed the meaning of the code and is far more worse than sorting by length since that one is at least potentially desirable.

Lorp commented 7 months ago

Right, I was assuming the sort by GID was a secondary sort after the sort by length. Still, that could be confusing if you have some equal-length subs that you need to happen in sequence.

anthrotype commented 5 months ago

FontTools only sorts by length

Well, actually it sorts by length first and secondarily sorts alphabetically by the ligature component glyph names. fra-rs I believe sorts by length and then GID, similar to makeotf if I understand correctly. I can see situations where the sorting is undesirable altogether. Ideally one should be able to opt out. For the default behavior I suppose we should stick to one officially documented ordering.

cmyr commented 5 months ago

So I've been revisiting this question along with @anthrotype, because there was a slight difference in the sorting behaviour of fea-rs (rust) and feaLib (python, fonttools) for these ligature rules, and for purposes of testing we try to have these two tools generate the same output wherever it is (ahem) feasible.

Currently, fea-rs matches afdko, but feaLib uses glyph names, not GIDs, to determine the ordering within a given LigatureSet table. We are now looking at standardizing on a single sorting approach, that accounts only for length, and is stable (in the order declared in the input) for ligatures within a ligature set. That is, given the following FEA,

sub f i by f_i;
sub f f f by f_f_f;
sub f f by f_f;
sub f f i by f_f_i;

we will end up with the final ordering,

f_f_f
f_f_i
f_i
f_f

In thinking about this, I have been trying to understand @khaledhosny's concerns about the sorting behaviour, specifically by trying to come up with some example of input text + ligature rules where the (unexpected) sorting behaviour could interfere with the designers intentions, and I'm struggling to come up with any.

My current understanding:

Am I missing anything? Does anyone have an example of an input string and a set of ligature rules where the sorting behaviour would confound the designer's intentions?

I think it would be nice, if the spec is going to suggest sorting, that it define how that sorting should occur, and I think that a sorting that considers only length and otherwise respects declaration order is the simplest; but i don't think this is hugely important, since as far as I can tell it should have no impact on the shaping behaviour.

anthrotype commented 5 months ago

Thanks Colin for clarifying the non-issue. We should not be talking about ordering of ligatures in general (as they appear in the feature.fea) but the order within a given ligature set keyed by first glyph, with each ligature set always necessarily sorted by the glyphID as per OpenType spec (no matter what FEA or font developer say). I agree that not ordering longer ligatures ahead of shorter ones may lead to some becoming unreachable -- why even bother having a f_f_i ligature if f_f would always match first?! So it makes sense to keep sorting ligature within a set by the length of ligature components. I also now see that even for different ligatures of equal length (within a set), it doesn't really matter which order they appear, either they will match the input string or they will not. So for these the only reason for specifying some order is consistency across implementations. We can sort by GID (like makotf and fea-rs do), by glyph name (like fonttools does), or not sort these (equal length ligatures with same first glyph) but keep in the same order as written in the FEA. I think overall the latter is the least effort for anybody so +1 to this.