Open hnakamur opened 12 years ago
couldn't you explain the subj a bit more verbose? tia
@dvv I just changed the title to MultiByte Encoding Support
Strings in Lua may contain any 8-bit value, including embedded zeros, which can be specified as ‘\0’, according to http://www.lua.org/ftp/refman-5.0.pdf
So we can use any multibyte character encoding such as Shift_JIS or EUC-JIS as well as UTF-8. Therefore we need APIs for converting encodings in strings or cBuffers. We'd like those APIs to be
We would like to define APIs to satisfy these goals. So more than a just simple API like convert(src, dest_encoding) returns dest is needed.
@dvv I hope this explanation is clear enough.
We should decide rules for encodings for strings and cBuffers.
It's just an idea, my take is to use only UTF-8 for strings and any encoding for cBuffers. And we would add APIs to cBuffers for interoperability to strings, so that we can pass cBuffers to APIs which expects strings.
However, I have not thought about it thoroughly yet, so i'm not sure this actually works. Maybe any encoding for both strings and cBuffers is a better way.
Something like Unix pipe would be best, for example, shift_jis -> UTF-8 -> to_lower.
Avoid iconv because of license incompatibility. Maybe we can use some parts of libnkf, bsdconv, PHP's mbstring.
This is a very important subject, so let's take time for consideration.