connectFree / lev

Levitate your app with Lev!
Apache License 2.0
101 stars 9 forks source link

MultiByte Encoding Support #81

Open hnakamur opened 12 years ago

hnakamur commented 12 years ago

Something like Unix pipe would be best, for example, shift_jis -> UTF-8 -> to_lower.

Avoid iconv because of license incompatibility. Maybe we can use some parts of libnkf, bsdconv, PHP's mbstring.

This is a very important subject, so let's take time for consideration.

dvv commented 12 years ago

couldn't you explain the subj a bit more verbose? tia

kristate commented 12 years ago

@dvv I just changed the title to MultiByte Encoding Support

hnakamur commented 12 years ago

Strings in Lua may contain any 8-bit value, including embedded zeros, which can be specified as ‘\0’, according to http://www.lua.org/ftp/refman-5.0.pdf

So we can use any multibyte character encoding such as Shift_JIS or EUC-JIS as well as UTF-8. Therefore we need APIs for converting encodings in strings or cBuffers. We'd like those APIs to be

We would like to define APIs to satisfy these goals. So more than a just simple API like convert(src, dest_encoding) returns dest is needed.

@dvv I hope this explanation is clear enough.

hnakamur commented 12 years ago

We should decide rules for encodings for strings and cBuffers.

It's just an idea, my take is to use only UTF-8 for strings and any encoding for cBuffers. And we would add APIs to cBuffers for interoperability to strings, so that we can pass cBuffers to APIs which expects strings.

However, I have not thought about it thoroughly yet, so i'm not sure this actually works. Maybe any encoding for both strings and cBuffers is a better way.

dvv commented 12 years ago

right. as previously stated, imho it's better to keep small and clean as far as we can, so utf-8 should be enough for starters. i have something related to this domain -- unicode -- do not know whether useful.