Raku / old-issue-tracker

Tickets from RT
https://github.com/Raku/old-issue-tracker/issues
2 stars 1 forks source link

Emoji sequences counted as multiple characters instead of a single character #4945

Closed p6rt closed 7 years ago

p6rt commented 8 years ago

Migrated from rt.perl.org#127047 (status was 'resolved')

Searchable as RT127047$

p6rt commented 8 years ago

From @lizmat

[22​:52​:03] \ folks, so am I getting it right that emoji skin color modifier should kinda combine with the previous character? So that we get 1 when we do .chars? [22​:52​:33] \ AlexDaniel​: it probably should, yes [22​:53​:54] \ what does the unicode say about this? Should it actually be treated as one grapheme? That would make sense [22​:55​:10] \ AlexDaniel​: AFAICT Unicode's current definition of "extended grapheme cluster" may not cover the skin color modifiers; if so, feel free to file a bug with them :) [22​:55​:35] \ “When a human emoji is not immediately followed by a emoji modifier character, it should use a generic, non-realistic skin tone, such as​: #​3399CC” [22​:55​:41] \ this is just nuts… [22​:56​:17] \ Ox0dea​: they are wrong. Throw a combiner into them and they'll say the same thing [22​:56​:29] \ RabidGravy​: it's fucking blue [22​:57​:19] \<chansen_> AlexDaniel​: why do you think I ask you? Unicode is insane! ;o) [22​:57​:39] \ AlexDaniel​: You're right. [22​:57​:58] \ thats a pretty blue [22​:58​:55] \ “As to hair color, dark hair tends to be more neutral, because people of every skin tone can have black (or very dark brown) hair—however, there is no requirement for any particular hair color. One exception is PERSON WITH BLOND HAIR, which needs to have blond hair regardless of skin tone.” [22​:59​:06] \ Just read this, it is hilarious​: http://unicode.org/reports/tr51/#Emoji_Modifiers_Table [22​:59​:35] \ lol [22​:59​:35] \ rofl unicode [23​:01​:25] \ it looks like the skin tone modifiers are Grapheme_Base, and not Grapheme_Extend like I'd expect, for some odd reason [23​:01​:28] \ yes, actually [23​:01​:31] \ That's a rhetorical question. [23​:01​:42] \ Skarsnik, what's the use case? [23​:01​:51] \ m​: my $f = "\x1F466\x1F3FE"; say "$f $f.chars()" # this feels like a bug [23​:01​:52] \<+camelia> rakudo-moar 9441bb​: OUTPUT«👦🏾 2␤» [23​:01​:52] \ “In real multi-person groupings, the members may have a variety of skin tones.” – does it mean that KISS character should have one person black and another one white?
[23​:04​:20] \ ok, I give up, I have no idea how this could possibly work [23​:04​:22] \ plus, how are you supposed to get notifications? You could have something listen to a bell character, but you wouldn't get the text and sender in the notification [23​:04​:30] \ AlexDaniel​: will submit rakudobug [23​:04​:45] \ lizmat​: oh, nice

FWIW, on http://unicode.org/reports/tr51/ , paragraph 2.2.3 states​: "A supported emoji modifier sequence should be treated as a single grapheme cluster for editing purposes (cursor moment, deletion, etc.); word break, line break, etc.” so it feels to me that this should say 1 instead of 2​:

my $f = "\x1F466\x1F3FE"; say "$f $f.chars()”

Liz

p6rt commented 7 years ago

From @samcv

The mentioned Emoji now pass, though there are some emoji with skin modifiers which do not yet pass though these ones include ZWJ so I am going to close this issue.

We pass all of the Emoji v4 which are emoji-sequences.

p6rt commented 7 years ago

@samcv - Status changed from 'new' to 'resolved'