coolaj86 / TextEncoderLite_tmp

Polyfill for the Encoding Living Standard's API
Apache License 2.0
30 stars 19 forks source link

Incorrect implementation for multibyte chars #9

Closed felixhammerl closed 6 years ago

felixhammerl commented 6 years ago

Looks like you've inherited a bug from https://github.com/feross/buffer, see https://github.com/feross/buffer/issues/164

felixhammerl commented 6 years ago

I've gutted the original implementation of everything but UTF8 and it works across the entire UTF8 range, see: https://github.com/emailjs/emailjs-stringencoding.

There's still a good chunk around the streams and stuff that can be removed, but so far it works.

Ruffio commented 6 years ago

@felixhammerl have you got a testcase showing the issue? It would actually be nice with a unit test verifying the output.

felixhammerl commented 6 years ago

Oh, didn't realize this was still open. You can substitute this entire repo with this:

unescape(encodeURIComponent( ... ))
decodeURIComponent(escape( ... ))

http://ecmanaut.blogspot.de/2006/07/encoding-decoding-utf8-in-javascript.html http://monsur.hossa.in/2012/07/20/utf-8-in-javascript.html

it's even there in the original implementation in the test files:

// Inspired by:
// http://ecmanaut.blogspot.com/2006/07/encoding-decoding-utf8-in-javascript.html
function encode_utf8(string) {
  var utf8 = unescape(encodeURIComponent(string));
  var octets = new Uint8Array(utf8.length), i;
  for (i = 0; i < utf8.length; i += 1) {
    octets[i] = utf8.charCodeAt(i);
  }
  return octets;
}

function decode_utf8(octets) {
  var utf8 = String.fromCharCode.apply(null, octets);
  return decodeURIComponent(escape(utf8));
}