ThomasDickey / original-mawk

bug-reports for mawk (originally on GoogleCode)
http://invisible-island.net/mawk/mawk.html
17 stars 2 forks source link

UTF-8: multiple bracket expressions containing character classes aren't matched #62

Open McDutchie opened 4 years ago

McDutchie commented 4 years ago

In UTF-8 locales, multiple bracket expressions containing character classes aren't matched correctly.

Symptoms:

$ echo 'é' | mawk '/[[:alpha:]]/'               # ok
é
$ echo 'éé' | mawk '/[[:alpha:]][[:alpha:]]/'   # NOT ok
(no output)

System: macOS 10.14.6. Using mawk @1.3.4-20171017_1 from MacPorts.

My locale:

$ locale
LANG="nl_NL.UTF-8"
LC_COLLATE="nl_NL.UTF-8"
LC_CTYPE="nl_NL.UTF-8"
LC_MESSAGES="nl_NL.UTF-8"
LC_MONETARY="nl_NL.UTF-8"
LC_NUMERIC="nl_NL.UTF-8"
LC_TIME="nl_NL.UTF-8"
LC_ALL=

In an ISO-8859-1 locale (LANG="nl_NL.ISO8859-1") this works fine.

McDutchie commented 4 years ago

See also: https://github.com/onetrueawk/awk/issues/45

ThomasDickey commented 4 years ago

This is a duplicate of issue #10