MicrosoftDocs / typography-issues

Creative Commons Attribution 4.0 International
47 stars 21 forks source link

[USE] Ambiguous interaction of CGJ and join controls #216

Open dscorbett opened 5 years ago

dscorbett commented 5 years ago

CGJ “may occur anywhere in a cluster with no effect”. ZWNJ “continues a preceding cluster but causes a cluster break after itself when the following character is not a mark character (gc=Mn or gc=Mc)”, but CGJ has gc=Mn. How should the sequence ⟨ZWNJ CGJ non-mark⟩ be broken? If the section about ZWNJ takes precedence, the ZWNJ precedes a mark character, so there is no cluster break. If the section about CGJ takes precedence, the CGJ is ignored, so the ZWNJ is effectively before a non-mark, so there is a cluster break.

ZWJ “continues a preceding cluster and joins it to a following character unless the following character is another ZWJ”. How should the sequence ⟨ZWJ CGJ ZWJ⟩ be broken?


Document Details

Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

behdad commented 5 years ago

@PeterCon can you please assign this to Andrew Glass?

xadxura commented 5 years ago

@dscorbett; @behdad CGJ has two purposes:

  1. To give a user control of the placement of combining signs in relation to a base character which would otherwise be subject to reordering through normalization (e.g., U+05D4 U+05BD U+034F U+05B2).
  2. To distinguish sequences of characters that are otherwise graphically identical (e.g., U+00E4 from U+0061 U+034F U+0308). Therefore, in order to determine the relative precedence of ZWJ/ZWNJ and CGJ, one must ask, in what scenarios could this reasonably occur? I've just taken a look at ZWJ/ZWNJ usage with Indic clusters as well as joining usage of ZWJ/ZWNJ, as well as reviewing Hebrew CGJ (1) and Latin diambiguation cases (2). I can't think of a compelling case where the co-occurrence of CGJ and ZWJ/ZWNJ is required. Therefore, I don't think it matters which way the precedence falls. Therefore, for the sake of resolving ambiguity in the spec, I will say that CGJ takes precedence, i.e., CGJ is ignored and the ZWJ/ZWNJ should not see it when determining a cluster break.