Open eggrobin opened 6 months ago
Perhaps including forms such as d+u200D+en
as options in the candidates window, or even better: including u200D
in any candidate that is a 'sequence' (e.g. π¦<u200D>π<u200D>πΌ
) would be helpful. This would leave full control over when and where to place typographic ligatures for the user directly from ππ¨π
πΈ, without the user actually having to bother with the rather annoying u200D
or u200C
.
I highly doubt that users of cuneiform fonts will (want to) learn about u200D
and u200C
and assume that otherwise most unwanted ligatures would be broken with a space
instead. (And ligatures requiring the User to write a u200D
wouldn't be used a lot.)
This solution would be significantly more intuitive and consistent across fonts, than the fontmaker arbitrarily deciding which ligatures are discretionary and which aren't. [^1]
Further, if there was an update to unicode and subsequently to the font, that includes a former typographic ligature as a unique codepoint - the fontmaker can change typographic ligatures with the u200D
to point to the new unique codepoint. Making the font backwards compatible. Otherwise the font would have to keep a now unnecessary ligature to achieve the same effect.
[^1]: Especially as the typographical ligatures may not be exactly intuitive in the first place.
Apologies for the delayed response. I had partially responded to this on Discord back in January, but I should probably write something down here too (especially since interesting examples have recently been brought to my attention).
including
u200D
in any candidate that is a 'sequence' (e.g.π¦<u200D>π<u200D>πΌ
) would be helpful
ZWJing up every diri is probably not a good idea, as it ends up working against the goals of the encoding model for cuneiform. One major underlying goal of the encoding model is to be compatible with common transliteration practices[^compatibility]. For many of these sequences, it is common practice for the transliteration to be given as a sequence, even if a ligature occurs.
To take a concrete example, the diri sign ππ has a distinct shape in Hellenistic Uruk[^enrique], see, e.g., https://www.ebl.lmu.de/fragmentarium/MLC.1874 o 4.
However, that ligated ππ is also used in cases where it is transliterated (and thus would be typed) si-a, such as https://cdli.mpiwg-berlin.mpg.de/artifacts/348467/reader/65783 o 1, or http://oracc.org/blms/P348565.28 r 13β². This last example (Examenstext A) is particularly interesting, as it is lemmatized with the morphology na.m:~;a, and is a witness to a composite which also has a Neo-Assyrian witness (where there is no ligature). The best way to accommodate both transliteration/input practice, and the handling of encoded composite text (such as that π πΊβπ πͺβπππ from Examenstext A), is for a Hellenistic Uruk font to have a default ligature for ππ (with no ZWJ involved).
Further, if there was an update to unicode and subsequently to the font, that includes a former typographic ligature as a unique codepoint - the fontmaker can change typographic ligatures with the
u200D
to point to the new unique codepoint. Making the font backwards compatible. Otherwise the font would have to keep a now unnecessary ligature to achieve the same effect.
Any change to the encoding model is tremendously disruptive to users and implementers at all levels: encoded corpora would need to be updated (in some cases, transliterations may be invalidated, see above), fonts need to be updated to support the new characters, new text will fail to match old text in search. I think the UTC would not lightly make such additions; the Unicode 7.0 additions were a fairly special case[^na] as they included clear contrasts (π¨ vs. π), and these were still somewhat disruptive; even today one still occasionally finds some bad pre-7.0 encodings.
the fontmaker arbitrarily deciding which ligatures are discretionary and which aren't.
Itβs not the fontmaker being arbitrary, itβs the second millenium scribeβ―!Β :-) More seriously, those ligatures are often a property of the style, not the text (which can exist independently of the style of a particular attestation, as in composite texts, words cited in reference works, etc.), and the style is inherently up to the font. The fontmaker therefore should add those ligatures that are almost always used in the target style as default ligatures. For instance, if in some cursive style ππ is nearly always ligated, it could be appropriate to have a ligature for that sequence even without the ZWJ.
On the other hand, for an Ur III lapidary font, this would be best treated as a discretionary ligature (in which case it could also be controlled by the presence of a ZWJ).
I highly doubt that users of cuneiform fonts will (want to) learn about
u200D
andu200C
[β¦] ligatures requiring the User to write au200D
wouldn't be used a lot.
Of course no user should have to know about the ZWJ, let alone type it (I am advising users to type zero-width spaces, and providing a way to do so, but at least those have a pretty tangible effect). These implementation details should be hidden from the user (which is what this issue is about, from the IME side), so that, for those relatively standard discretionary ligatures, such as that Ur III lapidary ππ, a d+en
composition should be added.
[^compatibility]: While this was not clear from the text of the Unicode Standard until recently, it was well-understood in the proposal documents. The 16.0Ξ² review draft includes updated text in Chapter 11 clarifying this: https://unicode.org/versions/Unicode16.0.0/core-spec/chapter-11/#G26959. [^enrique]: This was recently brought to my attention by Enrique JimeΜnez. [^na]: Of course another aspect here is that Neo-Assyrian has a very special status in many reference materials, where it otherwise tends not to require ligatures. Major reference works sometimes shape the encoding model in unexpected ways; I am reminded of this old discussion about CJKV Extension B, which I came across recently while perusing the Unicode mailing list archives: https://www.unicode.org/mail-arch/unicode-ml/y2004-m06/0223.html.
DUTR #56 suggests the use of ZWJ to hint ligaturing, see https://www.unicode.org/reports/tr56/#Discretionary_Ligatures.
@crzfub, who has been developing a font that supports those ligatures, pointed out that typing a ZWJ can be tricky. It could make sense to add support for compositions such as
d+en
for ππ (U+1202D U+200D U+12097); and of course one should also have the variousd+suen
,d+ellil
, etc.