Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

clang-format gets pretty confused by emojis in string literals #29503

Open Quuxplusone opened 8 years ago

Quuxplusone commented 8 years ago
Bugzilla Link PR30530
Status NEW
Importance P normal
Reported by Nico Weber (nicolasweber@gmx.de)
Reported on 2016-09-26 15:37:58 -0700
Last modified on 2016-09-26 17:00:06 -0700
Version unspecified
Hardware PC Linux
CC djasper@google.com, klimek@google.com, llvm-bugs@lists.llvm.org
Fixed by commit(s)
Attachments
Blocks
Blocked by
See also
We have various tests in blink that test processing of emojis (or other non-BMP
unicode chars). Here's one of the tests as processed by clang-format, compared
to the same test with each emoji replaced with an ansi char:

thakis@thakis:~/src/chrome/src$ cat test.cc
TEST_F(SymbolsIteratorTest, AllEmojiZWSSequences)
{
    CHECK_RUNS({ { "abcdefghijklmnopqrstuvwxyzabcdefghij"
        "klmnopqrstuvxyzabcdefghijklmnopqrstuvwxyzabcdefghilklmnopqrstuvwx"
        "yzabcdefghijklmnopqrstuvwxyzabcdef",
        FontFallbackPriority::EmojiEmoji } });
}

TEST_F(SymbolsIteratorTest, AllEmojiZWSSequences)
{
    CHECK_RUNS({ { "
Quuxplusone commented 8 years ago

...wow, bugzilla gets pretty confused by emojis in string literals as well!

Quuxplusone commented 8 years ago

I uploaded the bugreport to https://bugs.chromium.org/p/chromium/issues/detail?id=650391 as well, where it's displayed correctly.

Quuxplusone commented 8 years ago

Also, clang-format shouldn't break string literals along zero width joiner chars to not break up emojis like http://emojipedia.org/family-man-woman-girl/ . Hm, the flags are just two codepoints without joiner http://emojipedia.org/flag-for-ascension-island/

Clang-format's current behavior almost causes data loss for emojis in literals (it changes how editors can interpret the string).