IvanMathy / Boop

A scriptable scratchpad for developers. In slow yet steady progress.
https://boop.okat.best
MIT License
3.83k stars 356 forks source link

Support TextEncoder and TextDecoder API #25

Open fnky opened 4 years ago

fnky commented 4 years ago

Working with UTF-8 in JavaScript requires a lot of code and is very error prone. The TextEncoder and TextDecoder APIs would make this a lot easier to encode and decode between strings and bytes.

This is especially useful when you work with Uint8Array, for example to encode UTF-8 strings as hexadecimal and back.

IvanMathy commented 4 years ago

Interesting. Seems like JavascriptCore (which runs the scripts) does not support it. I don't believe I'd be able to add that in, but I can look.

In the meantime, you can probably use a polyfill to be able to access that functionality.

fnky commented 4 years ago

Seems like JavascriptCore (which runs the scripts) does not support it

The APIs are not part of ECMAScript specification but rather web standards as part of WhatWG specification:

you can probably use a polyfill to be able to access that functionality

The polyfills that are available doesn't usually support UTF-8 because punycode is a beast to polyfill itself, so they get around it by not handling those cases, unfortunately.

Edit: There is a polyfill that only supports UTF-8 but does not support other encodings.

IvanMathy commented 4 years ago

Which use cases specifically are you interested in? I won't be able to implement everything but if there are some general util functions that could be useful I'd be happy to fill the gap.

fnky commented 4 years ago

Both APIs are very useful when working with different encodings. Especially if you work with data of different encodings. An example would be encoding strings to hexadecimal for different encodings (utf-8, utf-16, utf-32).

Given that the goal of Boop is to manipulate data, and most often text, I could see some value in adding some form of these APIs to support multiple locales that includes special unicode characters to avoid mistakes during decoding/encoding.

IvanMathy commented 4 years ago

Would simple functions to go from string to UTF[8/13/32] array and back be enough? If so I can absolutely add that in. Might not be a 1-1 with TextEncoder but should be enough to get there...

fnky commented 4 years ago

If the functions could encode and decode to and from Uint8Array that'd be good. I don't think UTF-16 and UTF-32 is supported by TextDecoder and UTF-8 is most commonly used. I believe strings in JavaScriptCore are stored as UTF-16 code unit values, which is passed then to TextEncoder and encodes it to a Uint8Array of UTF-8 bytes.

The idea is just to provide functions for common scenarios which closely follows ECMAScript/Web standards.

An API could be:

StringToUTF8Bytes(string) => Uint8Array
UTF8BytesToString(buffer: Uint8Array) => String
fnky commented 4 years ago

I have found a JavaScript runtime project which includes an implementation of Encoding specification and other browser-like APIs.

There's also the implementation used in Deno.

Perhaps this could be a good starting point.