ThomasDickey / original-mawk

bug-reports for mawk (originally on GoogleCode)
http://invisible-island.net/mawk/mawk.html
17 stars 2 forks source link

characters-as-bytes #43

Open kmatt opened 7 years ago

kmatt commented 7 years ago

When attempting to precisely measure or split and input file on the byte size using length(), its difficult as Mawk appears to use the same behavior as the gawk --characters-as-bytes (-b) option as default.

Since changing this could break a bunch of scripts that expect the default Mawk behavior, perhaps a "-W bytes_not_characters" option would be useful.

stephane-chazelas commented 7 years ago

AFAIK, mawk still doesn't support multibyte characters. If multibyte character is added one day, it seems to me that a better, portable way to make sure bytes == characters is to use LC_CTYPE=C (that could have an impact on error messages in non-English locales though).