agnoster / base32-js

Base32 encoding for JavaScript, based (loosely) on Crockford's Base32
https://github.com/agnoster/base32-js
MIT License
121 stars 61 forks source link

[EXPIRED] Cross-cultural compatibility from the box #10

Closed sergeevabc closed 7 years ago

sergeevabc commented 9 years ago

Dear Isaac, console.log(base32.decode(base32.encode("love"))); works as expected, it returns love, hooray.

But let's step aside the anglosaxonian world and type something in Cyrillic. For example, equivalent of word love is любовь in Russian, yet console.log(base32.decode(base32.encode("любовь"))); returns garbled piece of text ?OªµÏ \Õ?Ukž¹²L.

There are various intermediate solutions to this issue. I like the following two:

var UTF8 = {
    encode: function(string) {
        string = string.replace(/\r\n/g, "\n");
        var utftext = "";
        for (var n = 0; n < string.length; n++) {
            var c = string.charCodeAt(n);
            if (c < 128) {
                utftext += String.fromCharCode(c);
            } else if ((c > 127) && (c < 2048)) {
                utftext += String.fromCharCode((c >> 6) | 192);
                utftext += String.fromCharCode((c & 63) | 128);
            } else {
                utftext += String.fromCharCode((c >> 12) | 224);
                utftext += String.fromCharCode(((c >> 6) & 63) | 128);
                utftext += String.fromCharCode((c & 63) | 128);
            }
        }
        return utftext;
    },
    decode: function(utftext) {
        var string = "",
            i = 0,
            c = 0,
            c2 = 0;
        while (i < utftext.length) {
            c = utftext.charCodeAt(i);
            if (c < 128) {
                string += String.fromCharCode(c);
                i++;
            } else if ((c > 191) && (c < 224)) {
                c2 = utftext.charCodeAt(i + 1);
                string += String.fromCharCode(((c & 31) << 6) | (c2 & 63));
                i += 2;
            } else {
                c2 = utftext.charCodeAt(i + 1);
                c3 = utftext.charCodeAt(i + 2);
                string += String.fromCharCode(((c & 15) << 12) | ((c2 & 63) << 6) | (c3 & 63));
                i += 3;
            }
        }
        return string;
    }
};

Also, there's Encoding API.

const txtencoder = new TextEncoder;
message = "любовь";
txtencoder.encode(message); // returns UTF8 Uint8Array

Could you be so kind and make it working from the box without those additional helpers so users around the globe could enjoy your library without a hassle? Developer of Base91's implementation already did it.