Open kmatt opened 7 years ago
AFAIK, mawk still doesn't support multibyte characters. If multibyte character is added one day, it seems to me that a better, portable way to make sure bytes == characters is to use LC_CTYPE=C (that could have an impact on error messages in non-English locales though).
When attempting to precisely measure or split and input file on the byte size using length(), its difficult as Mawk appears to use the same behavior as the gawk --characters-as-bytes (-b) option as default.
Since changing this could break a bunch of scripts that expect the default Mawk behavior, perhaps a "-W bytes_not_characters" option would be useful.