Quuxplusone / LLVMBugzillaTest

0 stars 0 forks source link

boost::regex match but not std::regex #39024

Open Quuxplusone opened 5 years ago

Quuxplusone commented 5 years ago
Bugzilla Link PR40052
Status NEW
Importance P normal
Reported by Georg Gast (georg-bsd@schorsch-tech.de)
Reported on 2018-12-17 07:26:14 -0800
Last modified on 2018-12-18 00:39:10 -0800
Version 6.0
Hardware PC FreeBSD
CC llvm-bugs@lists.llvm.org, mclow.lists@gmail.com
Fixed by commit(s)
Attachments bug.cpp (852 bytes, text/x-c++src)
bug.cpp (987 bytes, text/x-c++src)
testReTraits.cpp (618 bytes, text/plain)
bug.cpp (7220 bytes, text/x-c++src)
result-csv.txt (674664 bytes, text/plain)
result-csv.ods (143022 bytes, application/vnd.oasis.opendocument.spreadsheet)
Blocks
Blocked by
See also
Created attachment 21237
Testcase

I got the attached program. It has a global locale set (de_DE.UTF-8).

I think it might be a bug in libc++ because
on Windows(MSVC 2013 & MSVC 2017) and on Linux (gcc 8.2 + libstdc++) this regex
(from std) matches with the global locale from boost. Also the regex from boost
matches (replace std::regex by boost::regex).

This bug triggers only (also on my box and only on freebsd with clang and
libc++) when i use boost::locale. With std::locale() it matches.

I already submitted this bug to FreeBSD and to boost.org.

For reference, here are the links
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=233994

Boost.locale says:
[quote]
Boost.Regex and Boost.Locale aren't related, the locale generated by
Boost.Locale is "C" locale with addons unrelated to Boost.Regex
[/quote]
https://github.com/boostorg/locale/issues/35

FreeBSD says, it is a bug in boost.locale.

As both of my direct upstream bugtrackers seem to "dislike" this bug, i report
it to clang/libc++ directly.
Quuxplusone commented 5 years ago

Attached bug.cpp (852 bytes, text/x-c++src): Testcase

Quuxplusone commented 5 years ago
Expected Output of the Testcase:
All ok

Got:
Bug triggered
Quuxplusone commented 5 years ago

My first thought was that you had a "high ascii" character in the test case, and that was getting treated differently by the locale. That appears not to be the case.

Quuxplusone commented 5 years ago

Attached bug.cpp (987 bytes, text/x-c++src): Updated testcase with facets

Quuxplusone commented 5 years ago
I also tried to set the facets one by one. It just triggers at
- all_characters
- collation_facet
- all_categories

I also updated the testcase to show how i did it.
Quuxplusone commented 5 years ago

The bug also goes away if i remove the icase flag from the regex.

Quuxplusone commented 5 years ago
Is it specific to the "de_DE.UTF-8" locale, or does it happen with others?
I'm thinking of other UTF-8 locales, like "en_US.UTF-8" or "fr_FR.UTF-8"
Quuxplusone commented 5 years ago

Attached testReTraits.cpp (618 bytes, text/plain): Check all the characters for tolower

Quuxplusone commented 5 years ago

Whoops. I sent this to the wrong place. This should have been sent to https://reviews.llvm.org/D55746 instead. It may be related, but that's not for sure yet.

Quuxplusone commented 5 years ago

Attached bug.cpp (7220 bytes, text/x-c++src): expanded testcase that run 192 locales (1 segfaults if i do it) x 3 backends x 12 facets x icase on/off.

Quuxplusone commented 5 years ago

Attached result-csv.txt (674664 bytes, text/plain): result csv that shows the testresults of the locale combinations

Quuxplusone commented 5 years ago

Attached result-csv.ods (143022 bytes, application/vnd.oasis.opendocument.spreadsheet): Pivot Analysis of the result