Closed GoogleCodeExporter closed 9 years ago
The POSIX Locale is, well, a _locale_. Therefore, you need to use the LOCALE
flag and bytestrings:
regex.findall(b'(?L)[[:punct:]]' , ascii_sorted.encode('ascii'))
On Unicode strings, [[:punct:]] is mapped to \p{Punct}, which uses the Unicode
definition of 'punctuation'.
Original comment by re...@mrabarnett.plus.com
on 9 Dec 2014 at 2:17
I thought that doing locale.setlocale(locale.LC_CTYPE, 'C') would set the
locale used by regex.
Why doesn't this work?
regex.findall(b'[[:punct:]]' , ascii_sorted.encode('ascii'), flags=regex.ASCII)
Original comment by plane...@gmail.com
on 10 Dec 2014 at 3:28
The re module requires the LOCALE flag in order to make \w, \s and \b (and
their complements) locale-sensitive.
The regex module is intended to be compatible with the re module, and it merely
adds some more character classes.
Original comment by re...@mrabarnett.plus.com
on 10 Dec 2014 at 11:18
According the the Python docs, the LOCALE flag will be going away.
https://docs.python.org/3.5/library/re.html#re.L
re.L
re.LOCALE
Make \w, \W, \b, \B, \s and \S dependent on the current locale. The use of this flag is discouraged as the locale mechanism is very unreliable, and it only handles one "culture" at a time anyway; you should use Unicode matching instead, which is the default in Python 3 for Unicode (str) patterns. This flag makes sense only with bytes patterns.
Deprecated since version 3.5, will be removed in version 3.6: Deprecated the use of re.LOCALE with string patterns or re.ASCII.
Original comment by plane...@gmail.com
on 12 Dec 2014 at 3:57
Original issue reported on code.google.com by
plane...@gmail.com
on 9 Dec 2014 at 2:54Attachments: