anonyco / FastestSmallestTextEncoderDecoder

The fastest smallest Javascript polyfill for encodeInto of TextEncoder, encode of TextEncoder, and decode of TextDecoder for UTF-8 only.
https://anonyco.github.io/FastestSmallestTextEncoderDecoder/gh-pages/
Creative Commons Zero v1.0 Universal
145 stars 33 forks source link

TextEncoder.encode() does not properly handle out-of-order low surrogate #11

Closed opatomic closed 4 years ago

opatomic commented 4 years ago

When an out of order surrogate is encountered it should be replaced with the replacement character (utf-8 bytes 0xef 0xbf 0xbd).

Example: console.log((new TextEncoder()).encode("\uDC00")); result: Uint8Array(3) [ 237, 176, 128 ] expected: Uint8Array(3) [ 239, 191, 189 ]

anonyco commented 4 years ago

Funny, I just discovered a Chrome bug: typing "\uDC00" into the Chrome console causes Chrome to crash. It seems Chrome doesn't like incomplete surrogates just as much as this library, lol: https://bugs.chromium.org/p/chromium/issues/detail?id=1092264