Crissov / unicode-proposals

Proposals for new characters to encode and canonic character sequences to register
https://crissov.github.io/unicode-proposals/
Creative Commons Zero v1.0 Universal
183 stars 13 forks source link

Flags of England, Scotland, Wales (and Northern Ireland) emojis #358

Closed Crissov closed 7 years ago

Crissov commented 7 years ago

Emojis for the flags of the "home nations" of the United Kingdom of Great Britain and Northern Ireland are among the most frequently requested: England GB-ENG, Scotland GB-SCT, Wales GB-WLS or GB-CYM and Northern Ireland GB-NIR in ISO 3166-2:GB. Unlike all other subregion flags, these are frequently used in international team sports events (except the Olympics and some others), because these, by ISO 3166-2 terminology, 3 countries and 1 province (NIR) participate individually. In some cases, Northern Ireland has a joint team with the Republic of Ireland IE. Furthermore, Northern Ireland has no officially recognized flag (any more), but usually the Ulster Banner or the Flag of Ulster takes its place. FIFA :soccer: uses the three-letter codes ENG, SCO, WAL and NIR, respectively.

The Flag of England is a red orthogonal cross on a white canvas. The Flag of Scotland is a white saltire (i.e. diagonal cross) on a blue canvas. The Flag of Wales shows a red dragon atop a white and a green horizontal stripe. The Ulster Banner is the English flag with a red hand in a white star in its center and a crown above it. The Flag of Ulster is a red orthogonal cross on a yellow canvas with a red hand in a white shield in its center. St. Patrick's Saltire is a diagonal red cross on a white canvas and represents the whole of Ireland in the flag of the United Kingdom, although it never represented a sovereign Irish state.

So, everyone agrees these 3 or 4 are somewhat special, but the Unicode Consortium did expressedly not want to be involved in deciding which region was country-like and therefore deserved an emoji flag. They adopted ISO 3166-1 alpha-2 codes as a basis for RIS pairs, although that standard was never intended for such use – alas, there is no better one. ISO inherits the criteria to decide on code-worthiness from UNO's statistical division which developed the M.49 standard (with numeric codes primarily). The UN had some need to distinguish overseas dependencies from their homelands. All entries are recorded with a binary attribute independent. Unicode could have excluded or discouraged dependent codes from emoji flags, but they did not and this is why there are now flags for some (almost) uninhabited islands cluttering almost all implementations.

There were several options how to encode these much desired flag emojis.

  1. RI pair like proper countries
    1. using private-use codes (AA, QM..QZ, XA..XZ, ZZ) reserved for purposes like this, e.g. XE, XS, XW and XI or XN (and already established XK for Kosovo):
      🇽🇪, 🇽🇸, 🇽🇼, 🇽🇮 / 🇽🇳
    2. using non-assigned codes in violation of the ISO standard, e.g. EN, AB, WA:
      🇪🇳, 🇦🇧, 🇼🇦
  2. longer RI sequence with subregion code, e.g. GBENG:
    🇬🇧🇪🇳🇬, 🇬🇧🇸🇨🇹, 🇬🇧🇼🇱🇸, 🇬🇧🇳🇮🇷
  3. new RI characters
    1. subregion letters and a separator, GB-eng, GB-sct, GB-wls, GB-nir
    2. just a hyphen separator, GB-ENG, GB-SCT, GB-WLS, GB-NIR
    3. just 26 subregion letters, GBeng, GBsct, GBwls, GBnir
  4. a new, invisible Region Code Joiner: G+B+E+N+G
  5. a handful of new individual characters, e.g. Flag with Saltire
  6. ZWJ sequences
    1. Waving White Flag :white_flag: base plus determiner, like :rainbow_flag: = :white_flag:+:rainbow::
      🏳️‍🦁, 🏳️‍🦄, 🏳️‍🐉 / 🏳️‍🐲, 🏳️‍🦌
    2. Waving Black Flag :black_flag: base plus determiner, like Twitter's Jolly Roger pirate flag :black_flag:+:skull_and_crossbones::
      🏴‍🦁, 🏴‍🦄, 🏴‍🐉 / 🏴‍🐲, 🏴‍🦌
    3. Union Jack :gb: base plus determiner:
      🇬🇧‍🦁, 🇬🇧‍🦄, 🇬🇧‍🐉 / 🇬🇧‍🐲, 🇬🇧‍🦌
    4. Union Jack :gb: plus subregion RI sequence, i.e. GB+ENG or, more robust, GB+E+N+G:
      🇬🇧‍🇪🇳🇬, 🇬🇧‍🇸🇨🇹, 🇬🇧‍🇼🇱🇸, 🇬🇧‍🇳🇮🇷 / 🇬🇧‍🇪‍🇳‍🇬, 🇬🇧‍🇸‍🇨‍🇹, 🇬🇧‍🇼‍🇱‍🇸, 🇬🇧‍🇳‍🇮‍🇷
    5. of RI letters, so they cannot form pairs, i.e. G+B+E+N+G or FIFA's E+N+G:
      🇬‍🇧‍🇪‍🇳‍🇬, 🇬‍🇧‍🇸‍🇨‍🇹, 🇬‍🇧‍🇼‍🇱‍🇸, 🇬‍🇧‍🇳‍🇮‍🇷 / 🇪‍🇳‍🇬, 🇸‍🇨‍🇴, 🇼‍🇦‍🇱, 🇳‍🇮‍🇷
    6. arbitrary pair (or longer sequence) of emojis, e.g. :heavy_plus_sign::heart: or :x::blue_heart: or :negative_squared_cross_mark::large_blue_diamond:
  7. Wait for the dissolution of the United Kingdom.

For the determiners in 6., some existing metaphors could have been reused or new ones could have been derived from the flag design or the national symbolism. In the case of the British countries, animals have been proven quite popular: :lion:, :unicorn:, :dragon: / :dragon_face: and :deer:, although the deer, introduced in 2016, appears much less frequently than the other ones, a four-leaf clover :four_leaf_clover: or shamrock :shamrock: being more popular choices. For England and Scotland at least, a particular cross could have been used, e.g. :latin_cross: or :heavy_plus_sign: and :x:, :heavy_multiplication_x: or :negative_squared_cross_mark:. (There would also be more appropriate non-emoji characters in Unicode.) A (red) hand :hand: would also symbolize Northern Ireland (or Ulster; note: there's no harp emoji #333). Also see #329 for a discussion of possible color swatch emojis which could be used in ZWJ sequences.

Whatsapp pioneered solution 1.1, which is fully compliant with all affected standards. This approach would be limited to 30-something entries.

The British national standardization body, the BSI, is still preparing a proposal to ISO that would change the scope and rules of ISO 3166 so that the codes mentioned in 1.ii. could be formally registered. This would inevitably also lead to new top-level domains (ccTLD), much like the unrealistic option 7.

Option 2 has problems with backwards compatibility in existing implementations, because they cannot decide without more context whether a preceding RI is the right-hand side of another pair or the left-hand side of the current pair. That means GBENG could be rendered erroneously and confusingly as G:belgium::nigeria: or, more likely, :gb:E:nigeria:. Options 3., 4. and 6.iv. try to overcome this design mistake in RISs.

Option 4 was already decided against when flag emojis were first encoded. Compatibility with Japanese emojis would have required just 10 flags, :gb: among them. Support for the respective PUA codes is still found in Google and Apple products.

The Emoji Subcommittee and the Unicode Consortium as a whole adopted a mix of 6.ii. and 3.i., based upon an earlier proposal named UTS#52 and a subsequent variant named TERIS, which all use invisible Tag characters U+E00xy attached to a generic flag emoji. The solution makes it possible to encode any ISO 3166-2 subregion. The fallback should be something like a flag with a question mark on top, but effectively is just a black flag :black_flag: in all cases: 🏴󠁧󠁢󠁥󠁮󠁧󠁿, 🏴󠁧󠁢󠁳󠁣󠁴󠁿, 🏴󠁧󠁢󠁷󠁬󠁳󠁿 / 🏴󠁧󠁢󠁣󠁹󠁭󠁿, 🏴󠁧󠁢󠁮󠁩󠁲󠁿. Yes, that's bad. Every other seriously proposed solution provides better fallback behavior. What's worse, UTS#51 v5 only recommends support for the English, Scottish and Welsh flags. (A lot of people read that as a prescriptive mandate, not a descriptive observation.)

Petitions

Scotland/Alba or Saltire or St. Andrew Cross Flag

Wales/Cymru or Dragon Flag

Northern Ireland Flag

England or St. George Cross Flag

Proposals

Specific References

Standard References

Crissov commented 7 years ago

Unicode Emoji 5.0 has been released with unchanged Emoji Tag Sequences. Twitter Twemoji 2.3 and Google Android Oreo Noto Emoji support the English, Scottish and Welsh flags. Whatsapp continues to use RIS 🇽🇪 (XE), 🇽🇸 (XS) and 🇽🇼 (XW); since late 2017 also 🇽🇹 (XT) for Texas 🏴󠁵󠁳󠁴󠁸󠁿.