Open khaledhosny opened 7 months ago
I remember a time where the mantra “longer ligatures first” was important. I only found out about the re-ordering when trying to demonstrate this problem in one of my workshops.
I can see how this behavior might be considered a theoretical problem, but I think the benefits outweigh this concern. It seems natural for users to write shorter substitutions first.
That said, do you have a practical example where this re-sorting would cause actual harm?
FWIW, the sorting algorithm seems to be here: https://github.com/adobe-type-tools/afdko/blob/develop/c/makeotf/lib/hotconv/GSUB.c#L1730-L1768
See https://forum.glyphsapp.com/t/prioritizing-certain-ligatures/19433/14 for an example.
I don't see us just removing this part of the spec. Documenting the ordering requirement could be valuable, although there are a lot of things like this in the older parts of the spec and that horse may have left the barn. (We can document what AFDKO does, but that doesn't mean other implementations will update their algorithms if those differ.
We could add a flag to disable the sorting, but that would operate on a font-wide basis.
Seems like it might be better to add some sort of "explicit" command, similar to "subtable", that blocks any reordering within a lookup at the point where it is used.
FWIW, the sorting algorithm seems to be here: https://github.com/adobe-type-tools/afdko/blob/develop/c/makeotf/lib/hotconv/GSUB.c#L1730-L1768
This sorts by length and GID, which is double bad. Sorting by legnth is understandable, though misguided, but sorting by GID makes no sense.
- The sorting algorithm is undocumented, so there is no clear way to verify that implementations are implementing it compatibly.
Case in point, FontTools only sorts by length https://github.com/fonttools/fonttools/blob/fa59ada1b557bc304c592a2ca91c6b99ff6d241d/Lib/fontTools/otlLib/builder.py#L1570
Is the sort by glyphId simply to ensure consistent results between different sort algos?
I don’t think there is any point in sorting by GID, as it changed the meaning of the code and is far more worse than sorting by length since that one is at least potentially desirable.
Right, I was assuming the sort by GID was a secondary sort after the sort by length. Still, that could be confusing if you have some equal-length subs that you need to happen in sequence.
FontTools only sorts by length
Well, actually it sorts by length first and secondarily sorts alphabetically by the ligature component glyph names. fra-rs I believe sorts by length and then GID, similar to makeotf if I understand correctly. I can see situations where the sorting is undesirable altogether. Ideally one should be able to opt out. For the default behavior I suppose we should stick to one officially documented ordering.
So I've been revisiting this question along with @anthrotype, because there was a slight difference in the sorting behaviour of fea-rs (rust) and feaLib (python, fonttools) for these ligature rules, and for purposes of testing we try to have these two tools generate the same output wherever it is (ahem) feasible.
Currently, fea-rs matches afdko, but feaLib uses glyph names, not GIDs, to determine the ordering within a given LigatureSet
table. We are now looking at standardizing on a single sorting approach, that accounts only for length, and is stable (in the order declared in the input) for ligatures within a ligature set. That is, given the following FEA,
sub f i by f_i;
sub f f f by f_f_f;
sub f f by f_f;
sub f f i by f_f_i;
we will end up with the final ordering,
f_f_f
f_f_i
f_i
f_f
In thinking about this, I have been trying to understand @khaledhosny's concerns about the sorting behaviour, specifically by trying to come up with some example of input text + ligature rules where the (unexpected) sorting behaviour could interfere with the designers intentions, and I'm struggling to come up with any.
My current understanding:
f f
is a prefix of f f i
) in which case, if it occurs earlier in the set, the longer ligature will be unreachable.o f f i
is going to end up in a different ligature set than f f i
, and will always be applied before f f i
if it occurs, since the logical cursor will match the o
before seeing the f
.Am I missing anything? Does anyone have an example of an input string and a set of ligature rules where the sorting behaviour would confound the designer's intentions?
I think it would be nice, if the spec is going to suggest sorting, that it define how that sorting should occur, and I think that a sorting that considers only length and otherwise respects declaration order is the simplest; but i don't think this is hugely important, since as far as I can tell it should have no impact on the shaping behaviour.
Thanks Colin for clarifying the non-issue. We should not be talking about ordering of ligatures in general (as they appear in the feature.fea) but the order within a given ligature set keyed by first glyph, with each ligature set always necessarily sorted by the glyphID as per OpenType spec (no matter what FEA or font developer say). I agree that not ordering longer ligatures ahead of shorter ones may lead to some becoming unreachable -- why even bother having a f_f_i ligature if f_f would always match first?! So it makes sense to keep sorting ligature within a set by the length of ligature components. I also now see that even for different ligatures of equal length (within a set), it doesn't really matter which order they appear, either they will match the input string or they will not. So for these the only reason for specifying some order is consistency across implementations. We can sort by GID (like makotf and fea-rs do), by glyph name (like fonttools does), or not sort these (equal length ligatures with same first glyph) but keep in the same order as written in the FEA. I think overall the latter is the least effort for anybody so +1 to this.
The Feature File Specification §5.d, states that:
There are several issues with this:
I think this sorting should be deprecated and dropped, or if back-compatibility is a concern, have a way to disable it.